External Publication
Visit Post

I built a distributed job queue in Go to understand how they actually work

DEV Community [Unofficial] June 20, 2026
Source

I have used job queues my whole developer life without knowing what was inside them.

So I built one.

Not a wrapper around an existing queue. A full implementation from scratch with Redis, PostgreSQL, goroutines, and real failure handling.

Here is everything I learned.

Why Dual Storage

Most job queues use one store. Redis is fast. PostgreSQL is durable. I wanted both.

Redis handles dispatch via a sorted set priority queue. Fast enqueue, fast dequeue.

PostgreSQL is the source of truth. Every job lives there permanently.

The rule: no critical state lives only in Redis. If Redis wipes completely, no job is lost. PostgreSQL has everything.

Three Things Running Concurrently

  • A worker pool that executes jobs
  • A scheduler that promotes jobs from PostgreSQL into Redis when their time arrives
  • A stale reaper that detects crashed workers and requeues their jobs automatically

All three run as goroutines. All three coordinate without stepping on each other.

What Happens When a Worker Crashes

This is the part most tutorials skip.

When a worker picks up a job it marks it as in-progress. If that worker crashes mid-execution the job stays marked in-progress forever unless something intervenes.

The stale reaper scans for jobs that have been in-progress longer than their timeout. It requeues them automatically with exponential backoff.

No manual intervention. No lost jobs.

The Numbers

Metric Result
Job registration 52ns/op, 0 allocations
Job execution 950ns/op

Benchmarked with Go's built-in benchmark tooling.

Ships with a Prometheus metrics endpoint and a pre-built Grafana dashboard covering queue depth, throughput, and failure rates by job type.

What I Actually Understand Now

  • Why Redis alone is not enough for a job queue
  • Why crashed worker recovery needs to be a first class feature not an afterthought
  • Why exponential backoff matters more than immediate retries

The project is open source with one external contributor already.

  • GitHub: github.com/codetesla51/kyu

  • Landing page: kyu-job-queue.vercel.app

    `

Discussion in the ATmosphere

Loading comments...