Nel Ramblings

atproto and IPFS

Nelind June 4, 2026

atproto and IPFS share a lot of core similarities! They're both fundamentally about publishing data to a distributed, content addressed network and the both base their content addressing on various parts of the IPLD formats. This makes the two a strickingly interesting set of technologies to pair up! In many ways they "speak the same language" so to say and they each have their own personal strengths: atproto is good at small, contained, well structured, inter-linked data whereas IPFS focuses on well ... FS, file systems, storing bigger, blob size, pieces of data. Which atproto comparatively has somewhat rudamentary support for in the form of single file atproto blobs. Naturally you might want to combine strenghts here and build apps that utilise both to create a greater whole! ... but depending on how you go about that you might run in to a few places where the two structurally diverge causing issues. So let's talk about what I'd say are the two main ways to go about doing this and their tradeoffs!

Not so old ol' reliable CID

If you're wanting to use IPFS it's probably because atproto blobs aren't cutting it, maybe you have some structure like a full file system that doesn't do that well as an atproto blob or you want to take advantage of IPFS easy mirroring and pinning for a flexible blob cache and backup layer. Whatever the reason you are lucky that both IPFS and atproto use a common core in IPLD, specifically both make use of CIDs to content address data. So it's easy to just refer to an IPFS object by it's CID and boom! IPFS and atproto hand in hand ... except now you have a new problem: pinning.

Almost every content addressed system makes use of garbage collection to some degree. Get rid of all the nodes in the merkel DAG that aren't referenced by any other nodes and save space. For this to not just delete absolutely everything we need to mechanism to say what nodes should never be deleted which then keeps everything they refer to alive. IPFS calls this "pinning" and I'm going to use that terminology here. Making sure an atproto blob doesn't go away is easy, it's built into the blob lifecycle. You upload a blob to your PDS, make a record that references the blob and now the blob will stay around as long as the record exists. The PDS makes sure of that, it can do that because it's in charge of both the record and blob. When we instead point to an IPFS object the PDS can not do that anymore. We need an IPFS node to pin the object and keep it around.

Pinning is conceptually simple but it's an extra service to run and manage and have your frontend and potentially users deal with. There are whole companies that exist just to sell IPFS pinning services but unless you want to start paying after you've used your ~5 free GBs of pinned objects then that's not a viable solution. Besides you probably want something that's integrated with atproto at least a little. I don't knot of any IPFS node implementations or pinning services that will pin based on atproto repos. Perhaps someone should make one hint hint wink wink. Either way while this is absolutely the cleanest and most "proper" IPFS solution you will need to figure out pinning if you're going to go this route and ideally figure it out in a way that keeps data ownership with the user like atproto rightly cares a lot about. On the other hand it's not like going the atproto blob route means you don't have to deal with fetching and serving blobs anyways so perhaps the tradeoff isn't too bad. Speaking of atproto blobs ...

But like PDS pinning is right there?

So the main issue is pinning? Pinning blobs with a PDS is easy, let's not just use that as a backup! Make records the reference both an atproto blob and an IPFS object. If the IPFS object disappears from the IPFS network we can recreate it from the atproto blob easily. This ... sort of solves the issue? At least not pinning the object is less risky, but we'd probably still want to pin it somehow. Having to recreate the IPFS object a bunch is a bit inefficient not to mention annoying.

There are also two big downsides to this model. The obvious one is now you regain some of the downsides of atproto blobs in terms of blob size limits and the fact that it handles structured blobs ... annoyingly. The other downside is that while both atproto and IPFS use CIDs to refer to data they do it differently. You might at first expect that you can just make an atproto blob and upload it to IPFS and refer to it on both systems with the same CID. However you can't! Because IPFS has block size limits. For a bunch of efficiency reasons IPFS implementations are highly recommended (and expected) to never make blocks that are bigger than 1 MiB. This means that even if we're dealing with a single file atproto and IPFS will call it two different things: atproto will take the files hash, wrap it up in a CID and use that but IPFS will first split the file into small enough chunks and build up a merkel DAG with the chunks as leaf nodes and refer to the whole file by the CID of the root node of the DAG. So you have to keep both different CIDs around! No matter what.

Soooo yea! I think IPFS + atproto can lead to some very very nice app structures. But we need to figure out the pinning thing. If you somehow couldn't tell I think the former option is best data structure wise and that we should focus on how to properly provide easy to use, atproto integrated IPFS nodes that users can use to pin their stuff on, integration specifically with pinning directly from atproto records is probably a big piece of the puzzle there I think. Time will tell.

Not so old ol' reliable CID

But like PDS pinning is right there?

Discussion in the ATmosphere