Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihfpxvlrmys5lynuzqtwdrtyn63gymmcdugndoev6mzocc2ms6gpy",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moqfq5wawmg2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreicb5ix5vnpezxzgyknxlef4bsk7ncsfodezy5lha3cdvggumpjrhi"
    },
    "mimeType": "image/webp",
    "size": 103396
  },
  "path": "/uthman_dev/i-built-a-distributed-job-queue-in-go-to-understand-how-they-actually-work-fhk",
  "publishedAt": "2026-06-20T17:29:19.000Z",
  "site": "https://dev.to",
  "tags": [
    "architecture",
    "distributedsystems",
    "go",
    "showdev"
  ],
  "textContent": "I have used job queues my whole developer life without knowing what was inside them.\n\nSo I built one.\n\nNot a wrapper around an existing queue. A full implementation from scratch\nwith Redis, PostgreSQL, goroutines, and real failure handling.\n\nHere is everything I learned.\n\n##  Why Dual Storage\n\nMost job queues use one store. Redis is fast. PostgreSQL is durable. I wanted both.\n\nRedis handles dispatch via a sorted set priority queue. Fast enqueue, fast dequeue.\n\nPostgreSQL is the source of truth. Every job lives there permanently.\n\nThe rule: no critical state lives only in Redis. If Redis wipes completely,\nno job is lost. PostgreSQL has everything.\n\n##  Three Things Running Concurrently\n\n  * A **worker pool** that executes jobs\n  * A **scheduler** that promotes jobs from PostgreSQL into Redis when their time arrives\n  * A **stale reaper** that detects crashed workers and requeues their jobs automatically\n\n\n\nAll three run as goroutines. All three coordinate without stepping on each other.\n\n##  What Happens When a Worker Crashes\n\nThis is the part most tutorials skip.\n\nWhen a worker picks up a job it marks it as in-progress. If that worker crashes\nmid-execution the job stays marked in-progress forever unless something intervenes.\n\nThe stale reaper scans for jobs that have been in-progress longer than their timeout.\nIt requeues them automatically with exponential backoff.\n\nNo manual intervention. No lost jobs.\n\n##  The Numbers\n\nMetric | Result\n---|---\nJob registration | 52ns/op, 0 allocations\nJob execution | 950ns/op\n\nBenchmarked with Go's built-in benchmark tooling.\n\nShips with a Prometheus metrics endpoint and a pre-built Grafana dashboard\ncovering queue depth, throughput, and failure rates by job type.\n\n##  What I Actually Understand Now\n\n  * Why Redis alone is not enough for a job queue\n  * Why crashed worker recovery needs to be a first class feature not an afterthought\n  * Why exponential backoff matters more than immediate retries\n\n\n\nThe project is open source with one external contributor already.\n\n  * GitHub: github.com/codetesla51/kyu\n  * Landing page: kyu-job-queue.vercel.app\n\n\n\n\n\n    `\n",
  "title": "I built a distributed job queue in Go to understand how they actually work"
}