{
"path": "/3mdxmxpjkg22c",
"site": "at://did:plc:lvkhxfkdwqgwrpdek3h3q2gc/site.standard.publication/3m2ojl75sm22f",
"tags": [
"unison",
"html-parse",
"library"
],
"$type": "site.standard.document",
"title": "html-parse - HTML parser in Unison",
"content": {
"$type": "pub.leaflet.content",
"pages": [
{
"id": "019c23cd-a3df-7ff1-ac6a-35278b9f98d8",
"$type": "pub.leaflet.pages.linearDocument",
"blocks": [
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"$type": "pub.leaflet.blocks.text",
"facets": [],
"plaintext": "A few days back, I got interested in building an RSS reader on top of AT Protocol"
}
},
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"$type": "pub.leaflet.blocks.bskyPost",
"postRef": {
"cid": "bafyreihf7gkg7u5veawiu6weenyi7lkxodfnm2huu66xlrkdpz7xucl53m",
"uri": "at://did:plc:lvkhxfkdwqgwrpdek3h3q2gc/app.bsky.feed.post/3mcwzinax3k2e"
}
}
},
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"$type": "pub.leaflet.blocks.text",
"facets": [],
"plaintext": ""
}
},
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"$type": "pub.leaflet.blocks.text",
"facets": [],
"plaintext": "Reading RSS feeds means reading HTML content (mostly) syndicated from websites. So one of the building blocks would be to parse the raw HTML text into a structured representation that I could then encode into other formats, like Markdown. "
}
},
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"$type": "pub.leaflet.blocks.text",
"facets": [],
"plaintext": "There are many ways to achieve the same today in Unison land. But I am in the mood to get things from first principles, as much as possible."
}
},
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"$type": "pub.leaflet.blocks.text",
"facets": [
{
"index": {
"byteEnd": 39,
"byteStart": 29
},
"features": [
{
"uri": "https://hackage.haskell.org/package/html-parse",
"$type": "pub.leaflet.richtext.facet#link"
}
]
},
{
"index": {
"byteEnd": 108,
"byteStart": 103
},
"features": [
{
"uri": "https://share.unison-lang.org/@kaychaks/html-parse/code/main/@chb5kkmt40t5o441bj0jntpcvb8bf0vqf32i9ump45jlk7tqfe8vgks5vlhg53nq00fnpjpmmti2lnarqvnj7gg6a4j8flp6bu5ahao/types/internal/ast/Token",
"$type": "pub.leaflet.richtext.facet#link"
}
]
},
{
"index": {
"byteEnd": 212,
"byteStart": 199
},
"features": [
{
"uri": "https://share.unison-lang.org/@hojberg/html",
"$type": "pub.leaflet.richtext.facet#link"
}
]
}
],
"plaintext": "Hence, I ported the good-old html-parse library from Haskell land to tokenise HTML text into a list of Token and then added an Unison ability to encode those tokens into structured Html types of the @hojberg/html library."
}
},
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"$type": "pub.leaflet.blocks.text",
"facets": [
{
"index": {
"byteEnd": 56,
"byteStart": 36
},
"features": [
{
"uri": "https://share.unison-lang.org/@kaychaks/html-parse",
"$type": "pub.leaflet.richtext.facet#link"
}
]
}
],
"plaintext": "The result is something like this - @kaychaks/html-parse"
}
},
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"url": "https://share.unison-lang.org/@kaychaks/html-parse/code/main/@chb5kkmt40t5o441bj0jntpcvb8bf0vqf32i9ump45jlk7tqfe8vgks5vlhg53nq00fnpjpmmti2lnarqvnj7gg6a4j8flp6bu5ahao/terms/HtmlBuild/buildHtml",
"$type": "pub.leaflet.blocks.iframe",
"height": 935
}
},
{
"$type": "pub.leaflet.pages.linearDocument#block",
"block": {
"$type": "pub.leaflet.blocks.text",
"facets": [],
"plaintext": ""
}
}
]
}
]
},
"description": "Parse HTML text into a structured representation",
"publishedAt": "2026-02-03T15:02:43.325Z"
}