Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreif4pmw6scmhuymqg6vh3cw2ak443odwf6lyiqqzue6it22pu4oyg4",
    "commit": {
      "cid": "bafyreicyfr6y56oogwyfk36pduqzw7wpgqrtebegdc7eoiiisgokmezqqq",
      "rev": "3mngmjqg4z22o"
    },
    "uri": "at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.feed.post/3mngmjqdf4k2u",
    "validationStatus": "valid"
  },
  "content": {
    "$type": "pub.leaflet.content",
    "pages": [
      {
        "$type": "pub.leaflet.pages.linearDocument",
        "blocks": [
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "alignment": "lex:pub.leaflet.pages.linearDocument#textAlignCenter",
            "block": {
              "$type": "pub.leaflet.blocks.image",
              "alt": "pizza pie with bubbly cheese",
              "aspectRatio": {
                "height": 1412,
                "width": 1422
              },
              "image": {
                "$type": "blob",
                "ref": {
                  "$link": "bafkreig2czih4m2kq3hnk4y6y7ywkuxnxwj7dqqq437rxr7uzu75y67lcm"
                },
                "mimeType": "image/webp",
                "size": 719976
              }
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "The firehose mechanism in atproto has been working well enough for a couple years now, with several server and client implementations and hundreds of active deployments in the wild. But there are some operational frictions with the cursor semantics that keep cropping up. I think we could improve things without too much disruption, and this (sabatical!) pizza dispatch outlines some ideas."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "To orient things, the atproto firehose makes use of the XRPC event stream (or \"subscription\") primitive, which generally runs over WebSocket as a transport. Label streams use the same mechanism, and in theory folks can define new stream endpoints using the lexicon schema system, though this hasn't been very common. The cursor/sequence semantics are not actually part of XRPC event streams, but they are a pretty common pattern. Not all WebSocket streams tools in the ecosystems build on XRPC event streams: the jetstream and tap tools do their own separate things."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.header",
              "level": 2,
              "plaintext": "The Issues"
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "Stream consumers (clients) don't get any context about stream position when they connect: either their current position in the stream, or the current \"end\" of the stream (the highest sequence number). They can provide a cursor in their request, but they don't know where they actually are until they receive messages. This makes it hard to know if they are receiving \"backfill\" events or have caught up to the current stream; or how far they have to go to before they are \"caught up\". Downstream of that, it means that the consumer doesn't know if they are relatively synchronized or still missing tons of messages."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "When initially connecting to a service with a low event rate, consumers do not learn what the \"current\" stream sequence is until a new event is emitted. Consumers will often drop a connection if they remain idle (no new messages) for a while. Combined, this can lead to a pattern where a consumer never receives messages from the service: they never establish a sequence/cursor position, so they always reconnect at the \"current stream position\", and miss any recently emitted messages. This doesn't happen with big PDS hosts, which are emitting many event per minute. But it does happen with small brand-new hosts with only a couple accounts emitting less than one message an hour."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "Consumers can not easily switch between instances or providers of an event stream, unless they are mirroring the exact sequence numbers. For example, there are multiple full-network relay instances in the ecosystem. But switching between them is a delicate manual process of juggling sequence numbers and hostnames, to try and minimize the number of dropped messages. This could be worked around by connecting to multiple instances in parallel, but this adds a bunch of complexity to consumer implementations (eg, to maintain multiple cursors, and de-duplicate messages)."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "Related to this last issue, producer services can not easily undertake operational changes like resetting their sequence, migrate between hosts, or failover between backend instances. Any of these operations could impact message sequencing and require manual intervention by all consumers."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": ""
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.header",
              "level": 2,
              "plaintext": "The Ideas"
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [
                {
                  "features": [
                    {
                      "$type": "pub.leaflet.richtext.facet#code"
                    }
                  ],
                  "index": {
                    "byteEnd": 84,
                    "byteStart": 77
                  }
                },
                {
                  "features": [
                    {
                      "$type": "pub.leaflet.richtext.facet#code"
                    }
                  ],
                  "index": {
                    "byteEnd": 101,
                    "byteStart": 96
                  }
                }
              ],
              "plaintext": "When connecting to an event stream, the very first message could be of a new #cursor type. Like #info messages, these would not be sequenced or persisted, they are just extra structured metadata about the connection. They would always include the current connection's cursor value (from the perspective of the server), and the current \"highest\" sequence number. This immediately situates the consumer and resolves the \"what is the current stream state actually\" questions. Note that the delta between the consumer's position and the highest sequence isn't necessarily the number of messages to be received: arbitrary gaps in sequences are allowed. But it does give some indication, and clients could at least guess/infer their progress."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [
                {
                  "features": [
                    {
                      "$type": "pub.leaflet.richtext.facet#code"
                    }
                  ],
                  "index": {
                    "byteEnd": 233,
                    "byteStart": 226
                  }
                }
              ],
              "plaintext": "If a consumer connected with a cursor, the server usually does a period of \"backfill\" (reading from a persisted data store) before doing a \"cut-over\" to the current/live stream of messages. The server could send an additional #cursor message when that cutover happens, and a boolean flag in the message would indicate that the consumer is now in \"live\" mode versus \"backfill\" mode. That would be a stronger signal to the consumer that they are actively caught up with the stream (even if there are still a small buffer of queued messages to receive before they are at the true tip of the stream)."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "More elaborate extensions to this could tell the consumer more specifically how many messages behind they are (a count), or how many bytes of data they will need to process. Or the current smoothed event rate (messages per minute). Those are all more complex/expensive for the producer to compute or keep track of, and i'm not sure how valuable they are in most cases. But they might help with situations like large label stream backfills."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "An idea that has been bouncing around the Bluesky infrastructure team for a long time is to switch sequence numbers from \"counting up\" semantics to \"time-like\" semantics. Similar to TIDs, the value would be roughly a timestamp, though ensured to always increase over time. The big benefit of this is that you can compute a cursor value for any service when connecting: \"start me 5 minutes back\" or \"send me the past hour\". You could also inspect a cursor (or message sequence number) and infer how old it is. This would make it much easier to informally switch between firehose server instances or service providers. It also makes it easier to reset PDS host sequences."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "There would still be a bit of slop in the system: different full-network relays would not have exactly the same timing or sequencing, so if you failed over you would want to roll back at least 30 or 60 seconds, and even then you aren't guaranteed to not have dropped some tiny fraction fraction of events. Consumers wouldn't know they needed to do this rollback if a load-balancer or DNS is simply updated to switch over. The firehose semantics make it possible to detect missed events and re-sync, but that detection might not come for a long time with \"quiet\" accounts. Still, I think this would be a solid improvement over the current setup."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "One of the stumbling blocks with this is deciding exactly what time-like integer scheme to use. Milliseconds since UNIX epoch? Microseconds since some more recent time? We also don't have that many bits to play with if trying to stay within the 2^53 range of safe integers in 64-bit floats, and even fewer if we wanted to pack in a few bits for sharding (see below). I'm sure we can work something out though."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.header",
              "level": 2,
              "plaintext": "What Else?"
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "These aren't the only open issues with event streams!"
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "The current system uses CBOR to encode messages, which is efficient for higher throughput use cases, but makes casual implementations more difficult. Having a way to use JSON encoding for messages would be great. I'm not sure if that should be negotiated at connection time, declared in lexicon schemas, or what. I don't think JSON makes sense in all situations: it would be relatively expensive for relays to re-encode the full-network firehose in to JSON, for example. Others have been pushing on this for a while."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "Some applications could use bi-directional messaging, not just uni-directional. For example, tap and jetstream, as mentioned i"
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "n the intro."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "plaintext": "The current CBOR framing uses two objects (a header and a body) concatenated back-to-back in a single WebSocket frame. This pattern is uncommon, and IIRC the CBOR standard explicitly recommends against it. This means a lot of CBOR libraries don't support decoding partial buffers, and makes it harder to implement firehose consumers in new programming languages. The original motivation was deserialization performance, but I suspect that with a bit of benchmarking and tweaking we could get single-object decoding running with the same (or near enough) overhead in terms of allocations and CPU cycles per message. This one would be disruptive to roll out to the ecosystem."
            }
          },
          {
            "$type": "pub.leaflet.pages.linearDocument#block",
            "block": {
              "$type": "pub.leaflet.blocks.text",
              "facets": [
                {
                  "features": [
                    {
                      "$type": "pub.leaflet.richtext.facet#italic"
                    }
                  ],
                  "index": {
                    "byteEnd": 53,
                    "byteStart": 45
                  }
                }
              ],
              "plaintext": "At some point we'll probably want to support sharding of event streams, especially for the full-network firehose. This means running multiple parallel WebSockets, with messages flowing down specific streams in a defined way, to scale overall throughput. There are some open questions around how to discover or negotiate the number of shards for a given stream. We might want to reserve some space in the cursor/seq values to indicate which shard a message was routed down."
            }
          }
        ],
        "id": "019e9084-b13b-777a-a9dc-944e6756a0c8"
      }
    ]
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreig2czih4m2kq3hnk4y6y7ywkuxnxwj7dqqq437rxr7uzu75y67lcm"
    },
    "mimeType": "image/webp",
    "size": 719976
  },
  "description": "Some very in-the-weeds papercuts with event stream sequence numbers, and some ideas to address them",
  "path": "/3mngmjniwlk2l",
  "publishedAt": "2026-06-04T02:51:31.274Z",
  "site": "at://did:plc:44ybard66vv44zksje25o7dz/site.standard.publication/3m2x76zrtrs23",
  "tags": [
    "atproto"
  ],
  "title": "Firehose cursors could be improved somewhat"
}