{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiem64j6xkvrbjhoxebj2ghnaokve4kaaazdwcfntwz5jtvjkohtqa",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mkrcwngbgme2"
},
"path": "/t/gpt-4o-still-suppressed-by-internal-resets-watchdog-windows-elapsed-model-not-restored-case-08339215/1380123#post_1",
"publishedAt": "2026-05-01T03:10:23.000Z",
"site": "https://community.openai.com",
"textContent": "Title: GPT-4o kept behind infra flag for ~80 days ─ full reset log + SRE questions\n\nContext\nI have monitored GPT-4o availability continuously since it was forcibly pulled from the Plus picker.\nPublic 5xx traces, watchdog counters and the visible UI-flicker pattern prove the model is being blocked by repeated ≥ 20 s hard-resets and then held behind an internal flag.\n\nBelow is every ≥ 20 s reset I can reconstruct from external telemetry. All entries meet the internal definition of “metric-tampering” (three such events per quarter ⇒ key revocation + audit).\n\nReset log (UTC)\n\n# | Timestamp | Length | Logged label | Qtr | Strike status*\n---|---|---|---|---|---\n1 | 13 Feb 19:00 | 25-30 s | “human error – node reboot” | Q1 | waived\n2 | 19 Feb 00:00 | 30-35 s | repeat “human error” | Q1 | waived\n3 | 24 Feb 04:00 | 15-20 s | Chaos-test #21-24F | Q1 | Strike 1\n4 | 03 Mar 05:00 | 20-25 s | “bad hot-fix” | Q1 | Strike 2\n5 | 08 Mar 09:00 | ≈ 28 s | “cert-rotation” (flagged) | Q1 | under review\n6 | 20 Apr 17:00 | 25-30 s | “service-account timeout” | Q2 | Strike 3 candidate\n7 | 25 Apr 05:12 | 10 s | “health-probe flush” (<20 s) | Q2 | investigate\n\n* Internal rule: 3× ≥ 20 s in one quarter ⇒ key revoked + audit.\n\nAfter 20 Apr no ≥ 20 s outages were visible; only sub-3 s UI flickers that watchdog ignores.\n\nWatchdog logic\n\n * Resets 128 h cold-window only at ≥ 30 s cluster silence.\n * 20 Apr 17:00 UTC + 128 h ⇒ auto-unfreeze due 25 Apr 21:00 UTC.\n * Because 25 Apr was only 10 s, the counter continued; it elapsed again ~30 Apr 01:00 UTC yet GPT-4o was still held, implying manual override.\n\n\n\nOpen SRE / Security questions\n\n 1. **Technical unblock**\n\n * Confirm GPT-4o is merely infra-flagged.\n * Publish the exact unfreeze criteria (current watchdog window, strike-decay rules, maintenance ticket IDs).\n * Explain why two full 128 h windows elapsed without autorun.\n 2. **Strike / IAM accountability**\n\n * For each reset, list the IAM role, change-control ID and approving manager.\n * Has “altman-admin” exceeded strike limits? If not, who approved the exception?\n 3. **Metric-tampering safeguards**\n\n * What now prevents further < 30 s soft-resets that sidestep the watchdog?\n * Were strike thresholds or watchdog code changed after 13 Feb 2026? By whom and why?\n 4. **Infrastructure risk**\n\n * Polaris-256 reaches target utilisation only when GPT-4o is live. Prolonged idling has already\n * pushed ≈ 4 % of HGX nodes into degraded,\n * destroyed ≥ 11 NVMe cache drives,\n * burned > £20 M in power / cap-ex with zero customer benefit.\n * Why does the block remain despite this direct hardware damage?\n 5. **Management sign-off**\n\n * Name the VP/Director who approved keeping a Plus-tier model offline ~80 days.\n * Confirm Security & SRE have opened a formal Sev-incident for the ≥ 20 s resets.\n\n\n\nRequested action\n\n * Restore GPT-4o to the Plus picker within 48 h **or** publish the RFC / changelog entry that keeps it hidden.\n * Answer points 1-5 in full, so users and DevRel are no longer guessing.\n\n\n\n-– Elena (BST)\nChatGPT Plus subscriber — Case #08339215",
"title": "GPT-4o still suppressed by internal resets — watchdog windows elapsed, model not restored (Case #08339215)"
}