Raw Record Source

{
  "$type": "site.standard.document",
  "content": "---\ntitle: \"AT-URIs as persistent identifiers for scholarly blogging\"\ndescription: \"Every post on this blog now has a persistent AT-URI via the standard.site\n  spec---more durable than bare URLs, less overhead than DOIs.\"\ntags:\n  - atproto\n  - research\n  - web\n---\n\nEvery post on this blog now has a persistent identifier on the\n[AT Protocol](https://atproto.com/). If you scroll to the bottom of any post,\nyou'll find a \"Cite this post\" section with a BibTeX entry that includes an\nAT-URI alongside the regular URL. I did this because I wanted a citation\nidentifier for my blog posts that's more durable than a bare URL but doesn't\nrequire the institutional overhead of a DOI---and because the\n[standard.site](https://standard.site) spec gave me a clean way to do it.\n\nIf you write something on the web and want people to be able to cite it\nreliably, you've got three options with varying trade-offs, and the identifier\ntrilemma[^dilemma] is real. The first option is bare URLs: free and\nimmediate. `https://benswift.me/blog/2026/02/19/...` works right now and will\nkeep working for as long as I keep this domain pointed at this content. The\nproblem, of course, is that \"as long as I keep this domain\" is doing a lot of\nwork in that sentence. Domains lapse and hosting providers go away; site\nredesigns break paths. Tim Berners-Lee\n[argued in 1998](https://www.w3.org/Provider/Style/URI) that cool URIs don't\nchange, but on the real web, they change all the time.\n\n[^dilemma]:\n    Technically \"trilemma\" because there are three options; I briefly\n    called it a \"dilemma\" for the rhythm of the sentence, but who's that\n    pedantic?\n\nThe second option is DOIs, which solve the persistence problem through\ninstitutional infrastructure. CrossRef and DataCite maintain resolver\nservices, and the academic citation ecosystem understands DOIs natively. I'm\nan academic working at a university, so this is very much the water I swim in.\nBut getting a DOI means going through a registrar, and that typically means\neither publishing through a journal or paying for one yourself. For a rambling\npersonal blog post about yak-shaving your email setup, that's a bit\nmuch[^zenodo].\n\n[^zenodo]:\n    You _can_ get free DOIs through [Zenodo](https://zenodo.org/) by uploading\n    your work there, and that's a reasonable option for some things---I've\n    [done it myself via the GitHub integration](https://github.com/ANUcybernetics/llms-unplugged).\n    But it still means your canonical content lives in two places, and you need\n    to manually deposit each post.\n\nThe third option, AT-URIs, sits somewhere in the middle.\n[AT Protocol](https://atproto.com/specs/record-key) defines a URI scheme\n(`at://did:plc:abc123/collection/rkey`) where the authority is a\ncryptographically verifiable DID rather than a domain name. Your content lives\nin a [Personal Data Server](https://atproto.com/guides/glossary#pds) that you\ncontrol, and the DID follows you even if you move between PDS providers. The\nresolution doesn't depend on any single company keeping the lights on; it's\nfederated infrastructure rather than a domain registrar's renewal cycle.\n\nNone of these are perfect. But for the specific case of \"I write a blog and I\nwant a persistent, self-issued, machine-readable identifier for each post,\" the\nAT-URI approach hits a sweet spot.\n\nFor this site, the integration uses [standard.site](https://standard.site)---a shared set of\n[AT Protocol lexicons](https://atproto.com/guides/lexicon) for long-form\npublishing. There are two record types that matter:\n\n`site.standard.publication` describes the blog itself (name, URL, description).\nThere's one of these, stored with the rkey `self`:\n\n```\nat://did:plc:tevykrhi4kibtsipzci76d76/site.standard.publication/self\n```\n\n`site.standard.document` stores each blog post's content and metadata. Each post\ngets its own record with a deterministic rkey derived from the post's URL path:\n\n```\n/blog/2026/02/18/ben-s-dev-setup-2026-edition\n→ rkey: 2026-02-18-ben-s-dev-setup-2026-edition\n→ at://did:plc:tevykrhi4kibtsipzci76d76/site.standard.document/2026-02-18-ben-s-dev-setup-2026-edition\n```\n\nThe whole pipeline runs as part of the\n[GitHub Actions deploy workflow](https://github.com/benswift/benswift.github.io/blob/main/.github/workflows/deploy.yml).\nAfter tests pass, a publish script authenticates with the PDS, diffs content\nhashes against a state file to find new or changed posts, and calls `putRecord`\nfor each one. The state file gets committed back to main[^skipci], and VitePress\npicks up the AT-URIs at build time to inject\n`<link rel=\"site.standard.document\">` tags and citation metadata into the HTML.\n\n[^skipci]:\n    With `[skip ci]` in the commit message, naturally, to avoid an infinite\n    deploy loop.\n\nVerification works in both directions: the site serves a\n`/.well-known/site.standard.publication` file pointing to the publication's\nAT-URI, and each built page includes the document AT-URI in its `<head>`. Any\nindexer can match the web content to the protocol records and confirm they\nbelong together.\n\nWhy roll my own rather than use [Sequoia](https://sequoia.pub)? Sequoia is a\nperfectly good CLI for publishing standard.site records, handling\nauthentication, record creation, and the well-known file out of the box. I\nhand-rolled the integration anyway, for one specific reason: deterministic\nrecord keys.\n\nThe core idea comes straight from Berners-Lee's\n[cool URIs](https://www.w3.org/Provider/Style/URI) principle. If you're creating\npersistent identifiers, the mapping from content to identifier should be\n_computable_, not stored. My post at `/blog/2026/02/18/my-post` will always get\nthe rkey `2026-02-18-my-post`, which means its AT-URI is computable from the URL\nalone:\n\n```ts\nexport function pathToRkey(postPath: string): string {\n  const match = postPath.match(/\\/blog\\/(\\d{4})\\/(\\d{2})\\/(\\d{2})\\/(.+)/);\n  if (!match) throw new Error(`Invalid post path: ${postPath}`);\n  const [, year, month, day, slug] = match;\n  return `${year}-${month}-${day}-${slug}`;\n}\n```\n\nThis matters because it means the identifiers survive state file loss. If my\n`atproto-state.json` vanishes tomorrow, I can reconstruct every AT-URI from the\npost paths alone. The state file is just an optimisation cache for skipping\nunchanged posts; it's not the source of truth for identifiers.\n\nSequoia, like most atproto tooling, generates\n[TID-based](https://atproto.com/specs/record-key) rkeys: opaque\ntimestamp-derived strings like `3jzfcijpj2z2a`. They're unique, but they're not\ndeterministic. If you ever needed to republish your records (new PDS, corrupted\nrepo, whatever), you'd get different rkeys and different AT-URIs. Any citations\npointing to the old URIs would break. The whole point of persistent identifiers\nis that they don't do that.\n\nAnyway, I've had this blog online for over a decade now and I _think_ it's got\nat least some Google-juice (whether that stuff even matters anymore). Changing\nall the URLs just seems like a bad idea to throw all those direct links away.\n\nThe other half of the work is making the identifiers useful for citation\ntools.\nEvery blog post now includes\n[Google Scholar / Zotero compatible](https://scholar.google.com/intl/en/scholar/inclusion.html#indexing)\nmeta tags:\n\n```html\n<meta name=\"citation_title\" content=\"Ben's dev setup 2026 edition\" />\n<meta name=\"citation_author\" content=\"Ben Swift\" />\n<meta name=\"citation_date\" content=\"2026-02-18\" />\n<meta\n  name=\"citation_public_url\"\n  content=\"https://benswift.me/blog/2026/02/18/ben-s-dev-setup-2026-edition\"\n/>\n```\n\nAnd the \"Cite this post\" component at the bottom of each post generates BibTeX\nwith the AT-URI in the `note` field:\n\n```bibtex\n@online{swift2026benSDevSetup2026Edition,\n  author = {Ben Swift},\n  title = {Ben's dev setup 2026 edition},\n  url = {https://benswift.me/blog/2026/02/18/ben-s-dev-setup-2026-edition},\n  year = {2026},\n  month = {02},\n  note = {AT-URI: at://did:plc:tevykrhi4kibtsipzci76d76/site.standard.document/2026-02-18-ben-s-dev-setup-2026-edition},\n}\n```\n\nIt's not a DOI, and no reference manager will resolve it automatically (yet).\nMaybe I should try and land a PR in Zotero or something. But it _is_ a\nverifiable, self-issued identifier that lives on federated infrastructure. If\nsomeone cites a blog post of mine in a paper and includes the AT-URI, that\nidentifier will resolve as long as the AT Protocol network exists, independent\nof whether `benswift.me` is still pointing at the right server.\n\nAT-URIs don't have the institutional weight of DOIs. No journal, funder, or\nuniversity recognises them as \"proper\" persistent identifiers. And who even\nknows if my next promotion case is going to get any benefit from links to my\nstupid blog. The resolution infrastructure is young, with no equivalent of\n`doi.org` that an AT-URI cleanly resolves through. And the `standard.site`\nlexicons are still finding their shape; the spec could evolve in ways that\nrequire migration.\n\nThere's also a philosophical tension: I'm relying on the AT Protocol network\nbeing around long-term, which is a bet on a specific federation protocol\nsurviving. That said, it's a more distributed bet than trusting a single domain\nregistrar. And because the DID layer is separable from any particular PDS, the\nidentifiers have a plausible path to outliving any individual service provider.\n\nFor now, this is an experiment in treating blog posts as first-class scholarly\nartefacts, with real identifiers and a real citation workflow. If the AT\nProtocol ecosystem grows the way its proponents hope, these identifiers might\nactually matter. And if it doesn't, well, the citation meta tags and BibTeX\nstill work without them. Cite me and prove me right.\n\n## Update: discoverability in practice (2026-04-10)\n\nA few weeks after writing this, I went digging into a question the original\npost mostly hand-waved: the records are on the network, sure, but are they\nactually _discoverable_? Is anyone out there building \"Bluesky for blogs\"?\n\nTurns out: yes, though without much fanfare. As of today there are 147\n`site.standard.document` records in my repo, queryable directly from the PDS\nvia `com.atproto.repo.listRecords`, with the oldest going back to 2020 and the\nnewest from yesterday. Every write also flies past on the ATproto firehose,\nand at least one indexer is listening.\n[Standard Search](https://standard-search.octet-stream.net) is a firehose-fed\nsearch engine for standard.site records that uses the relay's collection\nlisting to backfill history without any crawling. It launched in January 2026\nwith around 3,900 documents indexed and has been growing steadily since.\n\nOn the reader side, [Leaflet.pub](https://leaflet.pub), Pckt.blog, and\nOffprint.app are publishing platforms that _also_ act as readers for each\nother's content. Leaflet can render a preview of a document authored in Pckt,\nand my Astro-published posts land in the same pool, no migration required.\nSurrounding tooling is starting to show up too: Sequoia for publishing from\nthe command line, astro-standard-site for Astro blogs like this one, and a\nproposed markpub markdown sub-lexicon.\n\nThe framing I missed first time round: this isn't going to be one monolithic\n\"Bluesky for blogs\" AppView. It's going to be N readers, search engines, and\nbookmarking tools all pointed at the same record pool via the shared\n[standard.site](https://standard.site) lexicons. Which is arguably a _more_\ninteresting outcome than Bluesky itself managed: publishing and reading\ndecoupled at the protocol layer, rather than bundled into one vertically\nintegrated app.\n\nStill small, obviously. Standard Search is one person's project and the\nreader ecosystem is a handful of apps rather than a flourishing market. But\nthe loop is closed (write → firehose → indexer → reader), and it's\npresent-tense infrastructure, not a bet on future stuff. I'll take it.\n",
  "createdAt": "2026-05-13T23:14:40.310Z",
  "description": "Every post on this blog now has a persistent AT-URI via the standard.site spec---more durable than bare URLs, less overhead than DOIs.",
  "path": "/blog/2026/02/19/at-uris-as-persistent-identifiers-for-scholarly-blogging",
  "publishedAt": "2026-02-19T00:00:00.000Z",
  "site": "at://did:plc:tevykrhi4kibtsipzci76d76/site.standard.publication/self",
  "tags": [
    "atproto",
    "research",
    "web"
  ],
  "textContent": "Every post on this blog now has a persistent AT-URI via the standard.site spec---more durable than bare URLs, less overhead than DOIs.",
  "title": "AT-URIs as persistent identifiers for scholarly blogging"
}