{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreigcup3egihaqjntwkldrdszmlcg25d27uryenxdrmpnw5j4ikwqbe",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mprxpxabrh62"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreiatrd6go55ixdscqxyuuelkk64y7c7oj7flqbr5y6i3wwzi4feqsa"
},
"mimeType": "image/webp",
"size": 254144
},
"path": "/samson_tanimawo/slos-that-product-managers-actually-understand-5ap2",
"publishedAt": "2026-07-04T01:16:55.000Z",
"site": "https://dev.to",
"tags": [
"sre",
"slo",
"product",
"reliability",
"Nova AI Ops",
"https://novaaiops.com",
"@sarah",
"@mike",
"@lisa"
],
"textContent": "## The SLO Translation Problem\n\nYou define an SLO: 99.95% availability with p99 latency under 200ms. Engineering loves it. Product managers glaze over.\n\nThe problem isn't the SLO. It's how we communicate it.\n\n## Speaking Product Language\n\nTranslate technical SLOs into business impact:\n\n\n\n Technical SLO: Product translation:\n ─────────────── ──────────────────────\n 99.95% availability \"22 minutes of downtime per month max\"\n p99 latency < 200ms \"The slowest 1% of users wait under 0.2s\"\n 99.9% error-free transactions \"For every 1000 purchases, at most 1 fails\"\n\n\nSuddenly, the product manager can make informed tradeoffs.\n\n## The SLO Negotiation Framework\n\nSLOs should be negotiated between engineering and product. Here's my framework:\n\n### Step 1: Measure Current Performance\n\n\n def current_performance(service, window_days=30):\n metrics = query_prometheus(f'''\n avg_over_time(\n (1 - rate(http_errors_total{{service=\"{service}\"}}[5m])\n / rate(http_requests_total{{service=\"{service}\"}}[5m]))\n [{window_days}d:1h]\n )\n ''')\n return {\n 'availability': f\"{metrics * 100:.3f}%\",\n 'monthly_downtime_minutes': round((1 - metrics) * 30 * 24 * 60, 1)\n }\n\n # Example output:\n # {'availability': '99.847%', 'monthly_downtime_minutes': 66.1}\n\n\n### Step 2: Present the Cost-Reliability Tradeoff\n\n\n Reliability Level | Monthly Downtime | Eng Investment | Feature Impact\n ────────────────-─┼─────────────────┼────────────────-┼──────────────\n 99.5% (current) | 3.6 hours | Baseline | None\n 99.9% (good) | 43 minutes | +1 SRE | -10% velocity\n 99.95% (great) | 22 minutes | +2 SREs | -20% velocity\n 99.99% (amazing) | 4.3 minutes | +4 SREs | -40% velocity\n\n\nThis makes the cost explicit. Most product teams choose 99.9-99.95%.\n\n### Step 3: Define SLIs That Map to User Journeys\n\nDon't define SLOs per service. Define them per user journey:\n\n\n\n slo_definitions:\n - name: \"Checkout Success\"\n description: \"Users can complete a purchase\"\n sli: |\n successful_checkouts / total_checkout_attempts\n target: 99.9%\n window: 30 days\n owner: payments-team\n product_owner: @sarah\n\n - name: \"Search Responsiveness\"\n description: \"Search results appear quickly\"\n sli: |\n search_requests{latency < 500ms} / total_search_requests\n target: 99.5%\n window: 30 days\n owner: search-team\n product_owner: @mike\n\n - name: \"Login Reliability\"\n description: \"Users can log into their accounts\"\n sli: |\n successful_logins / total_login_attempts\n target: 99.99% # Higher because login blocks everything\n window: 30 days\n owner: identity-team\n product_owner: @lisa\n\n\n### Step 4: The Monthly SLO Review\n\nWe run a 30-minute monthly meeting with engineering leads AND product managers:\n\n\n\n Agenda:\n 1. SLO status dashboard review (5 min)\n - Which SLOs are healthy? (green)\n - Which are at risk? (yellow)\n - Which were breached? (red)\n\n 2. Budget impact (10 min)\n - Error budget consumed per SLO\n - Projected budget at current burn rate\n - Feature freeze triggers\n\n 3. Tradeoff decisions (15 min)\n - Feature X requires relaxing SLO Y — approve?\n - Incident Z consumed 40% of budget — invest in fix?\n - New service launching — what SLO target?\n\n\n## The Dashboard That Changed Everything\n\nWe built a single-page SLO dashboard with three views:\n\n 1. **Executive view** : Traffic lights per user journey. Green/Yellow/Red.\n 2. **Product view** : Error budget remaining + projected depletion date.\n 3. **Engineering view** : Burn rate charts + contributing incidents.\n\n\n\nSame data, different lens. Everyone gets what they need.\n\n## Key Insight\n\nSLOs are a communication tool first, a technical tool second. If only engineers understand your SLOs, they're not working.\n\nIf you want SLOs that automatically track, alert, and report in plain language, check out what we're building at Nova AI Ops.\n\n**Written by Dr. Samson Tanimawo**\nBSc · MSc · MBA · PhD\nFounder & CEO, Nova AI Ops. https://novaaiops.com",
"title": "SLOs That Product Managers Actually Understand"
}