{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiffpe2q64ed4shgmquoycd5btmjq24hkyie2ueb7aomgs7deascty",
"uri": "at://did:plc:vrrdgcidwpvn4omvn7uuufoo/app.bsky.feed.post/3lbk6wc43f22n"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreic75elfs42n6beloi5omh4kg32p7lpkndehzawqyufux5mifevnrm"
},
"mimeType": "image/png",
"size": 39734
},
"description": "I recently built a Bluesky bot. To build it, I had to dig into the Bluesky firehose. Here's what I learned.",
"path": "/words/drinking-from-the-bluesky-firehose/",
"publishedAt": "2024-11-22T00:00:00Z",
"site": "at://did:plc:vrrdgcidwpvn4omvn7uuufoo/site.standard.publication/3mmyfl3pxzi2a",
"tags": [
"atproto",
"bluesky",
"javascript"
],
"textContent": "I recently built a Bluesky bot called Link Notifier that sends you a DM whenever someone posts a link to your website.\nTo build it, I had to dig into the Bluesky firehose.\nThat seems like a pretty common entry point for people looking to build on top of Bluesky, so I figured I'd share what I learned.\n\nThere are a couple ways to get at the Bluesky firehose:\n\nConsume it directly.\nThis is pretty complex, involving binary WebSocket messages containing CBOR-encoded Merkle Search Tree blocks.\nIf reading that makes you feel adrift in a sea of jargon, you're not alone!\n\nUse Jetstream, a first-party Bluesky service that converts the firehose into normal JSON.\nYou can self-host it if you want, but Bluesky provides official instances that you can connect to.\n\nYou might be tempted — as I was at first — to avoid all the gory details and just reach for a library like @skyware/jetstream.\n\nDon't be intimidated!\nThe Jetstream API is actually remarkably simple, and you can easily consume it without adding a dependency to your project.\n\nHere's a small example running in the browser that consumes the Bluesky Jetstream: a web component that shows the latest post every second.\n(This is totally unfiltered; I'm sorry if anything unsavory shows up here.)\n\nThe full code of this component is less than 40 lines — including the templating and all the web component boilerplate!\nThe code that reads from the Jetstream takes up about six.\nThere are no dependencies outside of the browser's standard library.\n\nBefore we look at any code, though, let's take a quick detour through the AT Protocol and Jetstream API.\n\nThe Jetstream API\n\nJetstream is a WebSocket server: we connect via a WebSocket connection, and it sends events as WebSocket messages encoded in JSON.\nYou can host a Jetstream instance yourself, but as of today Bluesky hosts official instances that you can use without authentication.\n\nThe connection string for a Jetstream instance looks like this:\n\nOnce you're connected, Jetstream will start sending events.\nThey look like this:\n\nThat's the full event of a post being liked.\n\nIt's pretty dense!\nThere are bunch of terms like \"collection\" and \"did\" that are idiosyncratic to AT Protocol.\nMost of them can be found in the glossary, but I'll try to define them in my own words as they come up as well.\n\nIn AT Protocol, everything a user does is found in a repo.\nEach repo has a DID: a Decentralized ID that uniquely identifies it.\nThe did property at the root of the event object is a reference to the repo of the user who took the action (in this case, the user who liked the post).\n\nThe kind property disambiguates between three types of events:\n\ncommit for events that create, update or delete something in a repo.\n\nidentity for events that describe some change to the repo itself (not quite sure which — I assume changing a handle would be one example).\n\naccount for events that describe a change in account status (e.g. from \"active\" to \"deactivated\").\n\nFor our purposes, we're only worried about commit events.\nThose events all have a nested commit object with an operation property: create, update or delete.\nI'll let you guess what those mean.\n\nEach commit object also has a collection property.\nThis is way to \"group\" events across repos.\nFor example, to listen to all new posts, we'd ignore all events in collections other than app.bsky.feed.post.\n\nIf we do all the filtering in the client, we'd be receiving a ton of data we don't need.\nJetstream provides a way to avoid this: append a wantedCollections query string parameter to the connection string.\n\nSay we're only interested in new posts and likes.\nWe'd connect to this long URL:\n\nThat wouldn't absolve us of the need to filter on the client — we'd still need to branch between new posts and likes within our app — but it would prevent us from sifting through a ton of other events we don't care about.\n\nWe can also use asterisks as \"wildcards\" to filter through multiple collections at once.\nFor example, to get events in all feed collections, we'd set wantedCollections to app.bsky.feed.*.\n\nOn create and update commits, the record is the \"contents\" of it — either the thing that was just created, or the thing with which to replace the previous record.\nAs a reminder, here's what the record looked like in the example event:\n\nAnd here's what a record might look like for posts:\n\nThis is a minimal example; Bluesky's documentation details how to handle links, quotes and so forth.\nNotice that in some cases (such as the text property of a post record) the record itself contains information, while in others (such as the subject property of a like record) it contains references to other records.\n\nThe Client\n\nThe simplest possible Jetstream client looks something like this:\n\nVoilà: two lines of code and every event from the Bluesky firehose gets logged to the console!\n\nWith a little elbow grease, we can come up with something a little more ergonomic.\nLet's write a client that mimics the @skyware/jetstream API:\n\nThis is still a pretty simple client that doesn't cover everything we might ever want to do with Jetstream, but it's more than enough to get us started.\n\nWe'll start by writing a Jetstream class:\n\nBy default, our client connects to the Bluesky-hosted Jetstream instance at jetstream1.us-east.bsky.network.\nThe user can override that by passing an endpoint into the constructor.\n\nWe also see two additional members:\n\nemitters, which holds a map of EventTargets keyed by the collection names.\n\nws, which will hold the WebSocket client when we connect to the Jetstream instance.\n\nFirst, we'll write a private #listen method that calls an event listener when the client receives an in a given collection with a specific operation:\n\nIt gets an EventTarget from the map at the given collection key — creating one if it doesn't exist — and attaches an event listener for events matching the given commit operation.\n\nWhen we're dispatching the events later, we'll use CustomEvents, which allow you to include arbitrary data in their detail property.\nSince the use of CustomEvents is an implementation detail, we'll just pass that property to the listener, rather than the whole event.\n\nFrom here, we can make public wrapper methods for each of those commit operations:\n\nThese don't really do much other than make that #listen method slightly more convenient to use.\n\nNext, let's take a look at the start method:\n\nThis looks pretty familiar: it's a thin abstraction over the barebones client we saw earlier.\n\nFirst, if there's already an open WebSocket connection, close it.\n\nNext, set up a new WebSocket connection at the appropriate URL.\n\nWhen we receive a message, parse the data into JSON.\n\nDiscard any non-commit events.\n\nGet the event emitter corresponding to the event's collection.\n\nDispatch the event using the commit operation as a key.\n\nSharp-eyed readers might notice that we haven't defined the class's url member yet:\n\nIt's a getter that constructs the WebSocket URL, adding wantedCollections query string parameters for any collections in which we're listening for events.\nThat way, we'll only receive the slice of the Jetstream containing the collections we care about.\n\nFor posterity, here's the full code:\n\n40 lines of code and we've replicated a significant portion of @skyware/jetstream!\nUse it, modify it and make something cool.",
"title": "Drinking from the Bluesky Firehose"
}