{
  "path": "/3mdxmxpjkg22c",
  "site": "at://did:plc:lvkhxfkdwqgwrpdek3h3q2gc/site.standard.publication/3m2ojl75sm22f",
  "tags": [
    "unison",
    "html-parse",
    "library"
  ],
  "$type": "site.standard.document",
  "title": "html-parse - HTML parser in Unison",
  "content": {
    "$type": "pub.leaflet.content",
    "pages": [
      {
        "id": "019c23cd-a3df-7ff1-ac6a-35278b9f98d8",
        "$type": "pub.leaflet.pages.linearDocument",
        "blocks": [
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [],
              "plaintext": "A few days back, I got interested in building an RSS reader on top of AT Protocol"
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.bskyPost",
              "postRef": {
                "cid": "bafyreihf7gkg7u5veawiu6weenyi7lkxodfnm2huu66xlrkdpz7xucl53m",
                "uri": "at://did:plc:lvkhxfkdwqgwrpdek3h3q2gc/app.bsky.feed.post/3mcwzinax3k2e"
              }
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [],
              "plaintext": ""
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [],
              "plaintext": "Reading RSS feeds means reading HTML content (mostly) syndicated from websites. So one of the building blocks would be to parse the raw HTML text into a structured representation that I could then encode into other formats, like Markdown. "
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [],
              "plaintext": "There are many ways to achieve the same today in Unison land. But I am in the mood to get things from first principles, as much as possible."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [
                {
                  "index": {
                    "byteEnd": 39,
                    "byteStart": 29
                  },
                  "features": [
                    {
                      "uri": "https://hackage.haskell.org/package/html-parse",
                      "$type": "pub.leaflet.richtext.facet#link"
                    }
                  ]
                },
                {
                  "index": {
                    "byteEnd": 108,
                    "byteStart": 103
                  },
                  "features": [
                    {
                      "uri": "https://share.unison-lang.org/@kaychaks/html-parse/code/main/@chb5kkmt40t5o441bj0jntpcvb8bf0vqf32i9ump45jlk7tqfe8vgks5vlhg53nq00fnpjpmmti2lnarqvnj7gg6a4j8flp6bu5ahao/types/internal/ast/Token",
                      "$type": "pub.leaflet.richtext.facet#link"
                    }
                  ]
                },
                {
                  "index": {
                    "byteEnd": 212,
                    "byteStart": 199
                  },
                  "features": [
                    {
                      "uri": "https://share.unison-lang.org/@hojberg/html",
                      "$type": "pub.leaflet.richtext.facet#link"
                    }
                  ]
                }
              ],
              "plaintext": "Hence, I ported the good-old html-parse library from Haskell land to tokenise HTML text into a list of Token and then added an Unison ability to encode those tokens into structured Html types of the @hojberg/html library."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [
                {
                  "index": {
                    "byteEnd": 56,
                    "byteStart": 36
                  },
                  "features": [
                    {
                      "uri": "https://share.unison-lang.org/@kaychaks/html-parse",
                      "$type": "pub.leaflet.richtext.facet#link"
                    }
                  ]
                }
              ],
              "plaintext": "The result is something like this - @kaychaks/html-parse"
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "url": "https://share.unison-lang.org/@kaychaks/html-parse/code/main/@chb5kkmt40t5o441bj0jntpcvb8bf0vqf32i9ump45jlk7tqfe8vgks5vlhg53nq00fnpjpmmti2lnarqvnj7gg6a4j8flp6bu5ahao/terms/HtmlBuild/buildHtml",
              "$type": "pub.leaflet.blocks.iframe",
              "height": 935
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [],
              "plaintext": ""
            }
          }
        ]
      }
    ]
  },
  "description": "Parse HTML text into a structured representation",
  "publishedAt": "2026-02-03T15:02:43.325Z"
}