{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreig6nuogzpreefenc5br6dga63u6p4mmuh26zhqdcskt7qnxrqr7yq",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mowozfc3thw2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreihnuklg2gelcho3k4iv7ifsldmybjygislnetxqmk6tjsxhtl7x2y"
    },
    "mimeType": "image/webp",
    "size": 77482
  },
  "path": "/pheonix_mk_e0ecc0233ababe/building-a-reliable-webhook-delivery-system-what-actually-broke-and-how-i-fixed-it-l74",
  "publishedAt": "2026-06-23T05:24:48.000Z",
  "site": "https://dev.to",
  "tags": [
    "api",
    "backend",
    "python",
    "systemdesign"
  ],
  "textContent": "Webhooks seem simple until a worker crashes mid-delivery, a subscriber goes down for an hour, or a payload gets tampered with in transit.\n\nHere's what I actually built to handle that — FastAPI + PostgreSQL + Redis.\n\n**The core problems I solved:**\n\n**1. Synchronous delivery blocks everything**\nNaive approach calls the subscriber URL inline. One slow endpoint stalls your whole ingest. Fix: return `202 Accepted` immediately, persist the event, deliver async.\n\n**2. Workers crash and jobs disappear**\nIf a worker dies mid-delivery, that job is stuck `IN_FLIGHT` forever. Fix: a watchdog sweeping every 30s, requeuing anything stale.\n\n**3. Retries without backoff make things worse**\nHammering a struggling subscriber on failure makes recovery harder. Fix: exponential backoff (2s → 32s, max 5 attempts) using a Redis sorted set as a delay queue — score = next attempt timestamp.\n\n**4. One dead subscriber degrades the whole system**\nFix: circuit breaker per subscription. 5 consecutive failures trips it OPEN. After 60s cooldown, one probe tests recovery before resuming.\n\n**5. No payload integrity**\nFix: per-subscription HMAC-SHA256 signature on every payload, verified with `hmac.compare_digest` to eliminate timing attacks.\n\n**Result:** 99.9% delivery reliability across 10,000+ daily webhooks, with full visibility via Prometheus + Grafana.\n\nFull deep-dive coming soon.",
  "title": "Building a Reliable Webhook Delivery System: What Actually Broke and How I Fixed It"
}