{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibzqivb27he2lwrsuvq7enfrxnv4ezv2sj4dlbbqcg7e73lnwgewi",
    "uri": "at://did:plc:i2ne3m5q6oq4jcnvn4k55skm/app.bsky.feed.post/3molyspgk5c32"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreid2dkng4ubyillrngvlx3n63om2vkdq6gqmi4fbtbe5jt7hwdzisq"
    },
    "mimeType": "image/png",
    "size": 901429
  },
  "description": "Observability & Telemetry :: SLO Definition Additions",
  "path": "/inference-sector-kpi-slo/",
  "publishedAt": "2026-06-18T23:39:50.000Z",
  "site": "https://prose.winterschon.com",
  "textContent": "## Why should anyone care about KPIs and SLOs?\n\nGenerally, unless one gets paid to deal with OBS/Tel and SLA conformance.. ehh. Except that the modern world operates within the bounds of a diverse and often not-fascinating array of performance indicators and services' operating metrics.\n\n#### So.. Here are two for today.\n\n  * **KPI** == Key Performance Indicator\n  * **SLO** == Service Level Objective\n\n\n\n### Minor Backstory - Today\n\nWhile aggregating directory content from several workstations into a unified NFS remote mount,_(can't simply have one workstation, right? right)_ , another encounter with SLO defs appears upon the terminal.\n\nThis time it's for LLM Inference customers and their typical requirements. Perhaps our standards repository would be helpful as an adjunct resource:\n\n  * https://github.com/yukon-systems/YukonSYS-Standard-Definitions\n\n\n\n## Inference Service KPIs by Sector\n\nAn extension of distributed cloud services architecture with heavy focus on baremetal with VMs and Containers throughout. Many ways to solve the problems, scale the infra, etc.\n\n#### SLO Additions to the Classifier Repo\n\n  * This file is a compact companion to the observability + telemetry glossary.\n  * Each threshold is a starting point and should be tightened to real-world workload SLOs after baselining standard operations.\n\nSector | High-signal KPIs | Default alert style\n---|---|---\nLLM Inference (GP-GPU Service Infra) | TTFT, ITL/TPOT, E2E, QTS | SLO-derived or baseline-relative; baseline-relative\nLLM Inference (API Service Infra) | P95 LAT, ERR, 429R, UST | SLO-derived\nLLM Inference (Network Hardware + Protocol Infra) | OWD, PDV, LOSS, ECN | baseline-relative; budget-relative\nLLM Inference (Prompt Caching, Compute + Re-Compute) | PCHR, CTR, TTFTR, RCR | absolute for cache-eligible traffic\nLLM Inference (Prompt Caching, Storage Infra) | CHR, CLAT, COCC, EVR | absolute; absolute for cache-eligible traffic\nLLM Inference (Prompt Caching API + Load-Balancers) | RCHR, LQ, URT, RTR | absolute\nLLM Training (bulk initial datasets) | ITPS, DHR, UDR, CONT | absolute; capacity-envelope\nLLM Training (pre-training MoE) | MFU, STEP, A2AS, EIR | absolute starting point; baseline-relative\nLLM Training (post-training MoE) | TPS, MFU, RACC, RMAR | absolute starting point; baseline-relative\nHPC - High Frequency Trading - VM Clusters | RDY, CSTP, NUMA MISS, T2D | absolute starting point\nHPC - High Frequency Trading - Baremetal | W2W, JITR, CPM, MPKI | baseline-relative; budget-relative\nHPC - High Frequency Trading - CDN | CHR, OOR, TTFB, OTTFB | absolute\nHPC - High Frequency Trading - Low-Latency Exec | OWD, JIT, FLAT, OLAT | baseline-relative; budget-relative\nHPC - Dark Fiber Regional Network + Infra | AVAIL, OWD, Pre-FEC BER, OSNRM | SLO-derived; budget-relative\nHPC - Quantitative Research + Machine Learning | BTT, FLAT, TSS, FFL | absolute starting point; baseline-relative\nHPC - Big-Data & Multivariate Pattern Analysis | JDUR, SKR, SPILL, CLAG | SLA-derived; absolute\nSLA + SLO Monitoring, Telemetry, Alerting Infra | SLI-AV, SLI-LAT, EBR, UP | SLO-derived\n\n* * *\n\n### Global Policy Adjustments\n\n  * **baseline_window** :\n    * `7d median or p50/p95 baseline unless otherwise stated`\n  * **slo_policy** :\n    * `page: error budget burn rate > 14.4 over 1h and 5m, or > 6 over 6h and 30m`\n    * `ticket: error budget burn rate > 1 over 3d and 6h`\n  * **capacity_policy** :\n    * `warn: sustained > 80% of validated steady-state capacity`\n    * `critical: sustained > 90% of validated steady-state capacity or latency inflects above SLO`\n  * **regression_policy** :\n    * `warn: > 1.10x to 1.25x baseline depending on sector sensitivity`\n    * `critical: > 1.20x to 1.50x baseline depending on sector sensitivity`\n\n\n\n* * *\n\n### Reference Considerations\n\n  * Thresholds are starting operational thresholds, not universal laws.\n  * Where a standards or vendor reference provides an acceptable range, that range is used.\n  * Where no universal value exists, thresholds are either SLO-derived or baseline-relative (typically against a 7d median or a validated capacity envelope).\n  * For latency-sensitive sectors, alert on percentile regressions and budget exhaustion, not averages.\n\n\n\n* * *",
  "title": "Inference Sector KPI SLO",
  "updatedAt": "2026-06-18T23:39:51.158Z"
}