Raw Record Source

{
  "path": "//stratos-community-privacy",
  "site": "at://did:plc:lrphxvv25aibthe7xoc2eeyy/site.standard.publication/3mimqtwsxep2r",
  "tags": "post",
  "$type": "site.standard.document",
  "title": "A Model for addressing privacy on ATproto",
  "description": "A proposal on how we can offer private data",
  "publishedAt": "2026-01-13T00:00:00.000Z",
  "textContent": "When I first got involved with Northsky social, we knew we would have to eventually address the elephant in the room. How do we _proactively_ protect a community from outside influences without relying purely on moderation?\nAnd more importantly, how do we keep bad actors from simply pulling their data straight from a relay?\n\n<Insert diagram showing how a firehose contains all events in the Atmosphere>\n\nThe beauty of ATproto is that developers can build an app quickly because they don't have to think about how to handle users and their data as it's stored on the users PDS, and they're free to communicate across the protocol. This requires quite a bit of trust in the inverted data model Bluesky designed ATproto with in mind. In order for all the users to communicate and any app to be an intermediary for these communication, it must be in the open. With this openness comes an inherent risk, not necessarily in the protocol but how it is used. \n\nhttps://bsky.app/profile/danielvanstrien.bsky.social/post/3lbvih4luvk23\n\nWe make a lot of assumptions about the safety of our data when we are interacting on bluesky, every single message and reaction is available for anyone and it takes one person to ruin the whole party. While people are creating data sets for training/research purposes, others are ingesting posts for other reasons that should make users careful about posting sensitive information. But even when taking all these precautions, we have vulnerable communities regularly targeted just because they have found a space for themselves. While we enjoying posting about whatever comes to mind, people will find your posts and use them for their own purposes. Or worse, if you are politically active or raising awareness you will be targeted.\n\n{% imagesRow %}\n  {% image \"assets/images/atproto-privacy-model/IMG_5157.jpg\", \"A bluesky post: I'm discontinuing my social media use for the foreseeable future and I want to explain why. I woke up in the new year to discover that some guy on here had saved photos I took on New Year's Eve and used Grok to remove my clothes. I know this because he  showed me.\", {\n    align: \"center\",\n    width: \"40%\",\n    height\": \"auto\"\n  } %}\n  \n  {% image \"assets/images/atproto-privacy-model/IMG_5185.jpg\", \"Bluesky user in Venezuela posting: Had to nuke everything because they have started persecuting people and going through their social media so they have an excuse to mark them as traitors or dissidents. Just letting you all know, but I will also delete this one tomorrow morning. Thank you all for your support and attention\", {\n    align: \"center\",\n    width: \"40%\",\n    height\": \"auto\"\n  } %}\n{% endimagesRow %}\n\nFor these reasons and many more, privacy is incredibly important. This takes us to the topic of this post, how can we start implementing some measures of privacy on ATproto? After having looked at all the current discussions it boils down to two options:\n\n1. Encrypted records\n2. Isolated records\n\nThe first option requires encrypting each record so it's only visible to the desired recipients. With MLS you typically have a group of users who have all exchanged keys and messages are encrypted with it and any time the group changes, there is a fresh exchange of keys to keep the future conversations safe from the departed member, for records it can be treated as each record is encrypted by all current users with a new user joining requiring a rotation, therefore any new joiner doesn't have access to prior records. In a small setup it's somewhat doable but doesn't scale well as the number of keys for the records grows, Liz who knows more about this than I do explains it in this thread quite effectively, it does not scale and any new followers would not be able to view prior messages (and not to mention, how do we handle search/indexing?). You can shift the encryption to the PDS to reduce the number of keys but you're putting off the issue until later.\n\nhttps://bsky.app/profile/lizthegrey.com/post/3mayi6iq6g22k\n\nThere are solutions out there which account for this such as Peergos which then raises the question, why not implement E2EE? The challenge with implementing such strong encryption on a microblogging platform (or simply ATproto records) is that we must be able to moderate the content and server it to users which requires multiple \"inspection points\":\n\n{% image \"assets/images/atproto-privacy-model/end-to-sieve.png\", \"A diagram showing a client sending encrypted data to another client with each inspection point resulting in a break of the encryption since it must view it\", {\n  align: \"center\"\n} %}\n\n1. Automated moderation - need to scan and label content, identify CSAM or other content which is illegal in the jurisdiction of the data host.\n2. Manual Moderation - Moderators need to review reported content and action on it including labeling or removal\n3. Indexing / Search - Need to be able to index content in order to serve it to users, allow users to search for it\n\nManual moderation can theoretically be handled by generating metadata with each report pointing to the record but still requires moderators be able to view it and take action, therefore moderators must always be a party to the records.\n\nIndexing can be simplified to creating plaintext metadata on record creation that is indexed allowing hydration to simply point to the records and users decrypt but then we lose the ability to cache effectively creating scaling issues and we lose search.\n\nBecause of the number of inspection points we can at best call it end to sieve. If we were to implement E2SE it would only serve to give users a false sense of security since if any inspection point is compromised all data is too. This risk exists in a system without E2EE but we can instead rely on strong encryption for data at rest/transport.\n\nhttps://bsky.app/profile/germnetwork.com/post/3mckcn35pt72v\n\nWhich brings us to the second option, isolated records.\n\nModeling shared access to an isolated space\n\nPaul spoiled the fun by already capturing the essence of what I believe is the right approach for providing this level of privacy. We don't need to blast out all the records into the firehose for everyone to see, we just need a data layer that is intelligent enough to know if a user has access to the records. Over the past year in Northsky we've been pondering how to pull this off as while we're building a platform, the data side still suffers from the same privacy issues. And thus began a deep dive into appviews, relays, and clients. \n\nWhen we login to Bluesky, we do so via a \"social app\" and are then presented with the followers feed with the option to switch to others. Behind the scenes, the social app is connected to an Appview which is \"hydrating\" all these feeds with posts it has indexed. \n\n{% image \"assets/images/atproto-privacy-model/appview-feed-hydration.png\", \"A diagram showing how a bluesky social app gets its posts via an appview to show in feeds\", {\n  align: \"center\"\n} %}\n\nThe appview itself knows who you are and is able to identify the relevant details about your identity, these feeds are all pulled from data that has been made available to the appview via the indexing process. Feeds offer tremendous flexibility where a feed creator is able to act on a users identity to serve them relevant content whether it's a simple regex against the firehose or analysing their recent like activity to serve related posts. If we're able to design these feeds, why can't we embed a new feed within the appview itself that serves context exclusively to users with a shared relationship?\n\n{% image \"assets/images/atproto-privacy-model/appview-feed-community.png\", \"A diagram showing how a bluesky social app accesses a community feed containing posts from users on the same PDS\", {\n  align: \"center\"\n} %}\n\nConsider a scenario where the user belongs to a PDS with other users and when any of them access a Community Feed served via the Appview they get posts exclusively from that PDS. We've now defined a shared boundary these users are all part of! And now we're able to create a dedicated feed that serves as a small community hub where they can interact and discover things together.\n\nTo address the privacy aspect, we need to also look at the PDS where all the records/blobs are stored. As it stands today when you create a record it is publicly available via relays which subscribe to the PDS which your user is hosted on, any new record is immediately ingested and made available for others to see and interact with. We want to allow users to make both public _and_ private posts which is where things get a bit squirrely. \n\nIntroducing Stratos\n\nWe want to remain on protocol while supporting data that is _kind of_ off protocol. During the initial planning, there was a hope that we could piggy back off the Bluesky lexicon for some level of compatibility with existing social apps and simply rely on embedding specific data that can serve as the isolation logic. Unfortunately this wasn't feasible as while parts of the lexicon are open the metadata we could embed would be an abuse of the specification, therefore my proposal is to define a separate lexicon that closely mirrors app.bsky in all manners since the goal isn't a new way of making posts but interacting with them. And thus, app.stratos was born.\n\n{% image \"assets/images/atproto-privacy-model/pds-straos-overview.png\", \"A diagram showing how a social app creates records that are routed based on lexicon, with app.bsky is ends up in PDS General Data. When it is app.stratos is ends up in PDS Private Data\", {\n  align: \"center\"\n} %}\n\nWhen a social app creates a record with the app.stratos lexicon, it is stored privately along with any associated blobs. While the app.stratos lexicon mirrors Bluesky, we need to attach additional metadata that informs Stratos how to handle the data (and the appview). For this, we propose adding a boundary property:\n\nThis defines the boundary of the data exposure allowing us to not only co",
  "canonicalUrl": "https://chipnick.com//stratos-community-privacy"
}