Building a Bluesky PDS on Kubernetes From Scratch
This guide starts with empty Git repositories and ends with a production-minded AT Protocol PDS at pds.dr0p.info. The target deployment shape is k3s, Argo CD, Traefik, cert-manager, SealedSecrets, Cloudflare DNS, Longhorn storage, Cloudflare R2 blob storage and backups, and GitOps-managed manifests.
This is my pds. There are many like it, but this one is mine. My pds is my... you get it. The path is intentionally narrow. Alternatives are called out only where they help adapt the guide without changing the main design.
This is not the shortest way to run a PDS. The official Docker installer is much shorter. This guide is for the case where you already run Kubernetes, want a GitOps-shaped deployment, and care about understanding the storage, ingress, DNS, TLS, backup, and identity pieces.
By the end, these checks should pass:
- DNS: pds.dr0p.info and .dr0p.info resolve to the public edge node.
- TLS: cert-manager has issued pds-tls for .dr0p.info.
- PDS health: https://pds.dr0p.info/xrpc/_health returns the running PDS version.
- Server description: describeServer reports .dr0p.info as an available user domain and invite codes as required.
- Account creation: goat can create the first account on the PDS.
- Handle resolution: https://conrad.dr0p.info/.well-known/atproto-did returns the account DID.
- Public identity: the Bluesky public API resolves conrad.dr0p.info to the same DID.
- Firehose: com.atproto.sync.subscribeRepos accepts a WebSocket connection.
- Backups: Litestream logs show successful SQLite replication, blobs are written directly to R2, and at least one SQLite restore test has been run.
Before copying commands, choose your values.
| Value | Example in this guide | Used for |
|---|---|---|
| Base domain | dr0p.info | DNS zone, handle suffix, ACME DNS-01 |
| PDS hostname | pds.dr0p.info | PDS API, Bluesky custom hosting provider |
| Handle suffix | .dr0p.info | User handles like conrad.dr0p.info |
| Public edge IP | 76.154.145.38 | pds and wildcard DNS records |
| WireGuard subnet | 10.50.0.0/24 | k3s node identity and cluster transport |
| Internal LoadBalancer pool | 192.168.1.240-192.168.1.250 | MetalLB for private ingress |
| Core repo | https://tangled.org/c0nr.ad/k8s-core | Argo CD platform source |
| Apps repo | https://tangled.org/c0nr.ad/k8s-apps | Argo CD workload source |
| R2 bucket | pds-dr0p-info | SQLite replicas and PDS blobs |
| SMTP sender | pds@dr0p.info | Email confirmation and account operations |
The guide assumes the machines already exist and can reach each other over WireGuard. It does not teach WireGuard peer setup, router port forwarding, or cloud VM provisioning. It also assumes Argo CD can read your Git repositories; if your repos are private, configure Argo repository credentials before applying the root apps.
The live validation for this post ended with pds.dr0p.info serving PDS 0.4.219, conrad.dr0p.info resolving through the public ATProto identity API, email confirmation delivered through Resend, and Litestream replicating SQLite state to R2. Treat the pinned chart and image versions as a known-good snapshot, not as permanent recommendations.
- What We Are Building
- Public PDS hostname: pds.dr0p.info
- User handles: user.dr0p.info, not user.pds.dr0p.info
- Kubernetes distribution: k3s
- Deployment model: GitOps with Argo CD
- Core repo: k8s-core
- App repo: k8s-apps
- Public ingress: Traefik on an external/public node
- Internal ingress: Traefik on internal nodes
- TLS: cert-manager with Cloudflare DNS-01
- Secrets: Bitnami SealedSecrets
- PDS database: SQLite on block storage
- PDS blobs: Cloudflare R2 from day one
- Backups: Litestream to Cloudflare R2 from day one
- Account posture: public-ish, with invites/approval required
- Architecture
Before touching YAML, get the shape of the system straight. The PDS itself is a small service, but it sits at the intersection of public ingress, TLS, persistent storage, email, DNS, identity, and backups. The boring parts around it matter more than the container spec.
The cluster has two kinds of nodes.
- Internal nodes run the Kubernetes control plane, normal workloads, Longhorn, and the PDS.
- External nodes sit on the public internet and terminate public HTTP and HTTPS through Traefik.
The minimum lab is one internal k3s server plus one external k3s agent. The better production shape is three internal storage-capable nodes plus one external edge node. I am going to write the guide for the production shape, but keep notes where a two-node lab needs different settings.
Node labels are the simple trick that keeps this understandable.
- Internal nodes get node.kubernetes.io/network-location=internal.
- External nodes get node.kubernetes.io/network-location=external.
- Longhorn storage nodes also get node.longhorn.io/create-default-disk=true.
- Stateful applications, including the PDS, should prefer or require internal.
- Public Traefik should require external.
- Internal Traefik should require internal.
WireGuard is the cluster transport between internal and external nodes. The public edge node joins the same k3s cluster, but Kubernetes should talk to it over WireGuard rather than over the raw public internet. In the k3s install section, we will make the node IPs line up with the WireGuard addresses so flannel, kubelet, and service routing all use the private tunnel.
There are two ingress planes.
- traefik-internal serves private names like argocd.local.dr0p.info and is reachable only on the private network.
- traefik-external-public serves public names like pds.dr0p.info from the external edge node on host ports 80 and 443.
MetalLB belongs on the internal side. It gives private LoadBalancer IPs to internal services. The external edge should not participate in MetalLB L2 announcements for the home LAN.
Cloudflare is responsible for public DNS and ACME DNS-01 validation.
- pds.dr0p.info points at the public edge node.
- User handles will be shaped like alice.dr0p.info.
- For public-ish account creation, use wildcard handle routing for .dr0p.info to reach the PDS for ATProto handle verification.
- Exact app hostnames must win over wildcard handle routing, so Traefik route priority matters.
There are two workable handle strategies.
- Wildcard handle routing: send /.well-known/atproto-did requests for .dr0p.info to the PDS and let the PDS answer handle resolution requests.
- Per-user DNS TXT: create _atproto.user.dr0p.info records for each account and only route pds.dr0p.info to the PDS.
For a public-ish PDS, wildcard handle routing is the better default. It needs wildcard DNS and wildcard TLS for .dr0p.info, not .pds.dr0p.info, because the handles are alice.dr0p.info, not alice.pds.dr0p.info.
Storage is deliberately not NFS for the PDS database. The official PDS uses SQLite. SQLite wants local/block-device semantics, especially once write-ahead logging and backup tooling enter the picture. Longhorn gives us a Kubernetes-native ReadWriteOnce block volume for /pds. NFS can still be useful for media libraries, shared caches, or backup staging, but it is not the blessed path for PDS SQLite.
Field note from my first live deploy: I temporarily used an existing NFS storage class to get the PDS online while fixing Longhorn host prerequisites. It worked well enough to validate DNS, TLS, account creation, email, and federation, but it is still not the storage shape I would recommend as the durable design for SQLite. If you make the same temporary tradeoff, keep one replica, keep Recreate, and do not treat the result as highly available.
The PDS keeps SQLite state on the /pds volume, but writes blobs directly to Cloudflare R2. For a near-single-user PDS, the R2 cost is small enough that this is the cleaner production default: the PVC does not grow with media, disaster restore does not need a separate blob sync step, and there is no hourly backup sidecar to monitor. A local disk blobstore is still useful for a lab, but it is not the default path in this guide. Backups are not deferred: Litestream replicates SQLite state to Cloudflare R2 from day one.
The GitOps split is simple.
- k8s-core owns cluster foundations: Argo CD entrypoints, cert-manager, SealedSecrets, Traefik, MetalLB, Longhorn, issuers, and storage classes.
- k8s-apps owns workloads: the PDS app, its PVC, config, sealed secrets, ingress, certificate, backup config, and post-deploy notes.
The bootstrap phase should be short. Install k3s, install Argo CD once, point Argo at k8s-core, then let Argo converge the rest. After that, normal changes are Git commits.
The verification pattern for the rest of the guide is: test repository operations locally with fake remotes when possible, test Kubernetes manifests in a temporary k3d cluster when possible, and call out what those tests do not prove. k3d is not a substitute for real nodes, WireGuard, disks, or public ingress, but it catches a lot of broken YAML and wrong assumptions before they reach the actual cluster.
For a local end-to-end k3d run, use deliberate substitutions rather than pretending the lab is production: sslip.io hostnames instead of public DNS, a self-signed pds-tls Secret instead of ACME, a local PV instead of Longhorn, and MinIO instead of Cloudflare R2. The PDS manifests in this guide are plain Kubernetes resources and work well with a small Kustomize overlay for those local-only replacements.
- Prerequisites
This guide assumes very little on the workstation. I use a small Nix flake to pin the CLI tools, because debugging a cluster is annoying enough without also wondering which kubectl, k3d, or kubeseal happened to be first on PATH.
- Domain: dr0p.info in Cloudflare
- Cloudflare API token with DNS edit permissions for dr0p.info
- Machines or VMs for k3s
- A public node or VPS that can receive TCP 80 and 443
- A WireGuard network between internal and external nodes
- Cloudflare R2 bucket for Litestream backups and PDS blobs
- SMTP provider for PDS email, such as Resend
Cr
Discussion in the ATmosphere