Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidw3adloips3u3knkduxtnoro6dca7k4nvbv27aexhcgrshlfuyve",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mmhk4jmgaqo2"
  },
  "path": "/t/suggestion-proactive-handoff-and-rescue-workflow-for-codex-compaction-failures/1381588#post_1",
  "publishedAt": "2026-05-22T17:20:24.000Z",
  "site": "https://community.openai.com",
  "tags": [
    "github.com/openai/codex",
    "Codex Desktop: proactive handoff and rescue workflow for unrecoverable remote compaction failures",
    "Longtime76"
  ],
  "textContent": "I wanted to share a recovery workflow we hit today in Codex Desktop, because I think it points to a useful product enhancement for long-running project threads.\n\nWe had a very valuable Codex project thread become unusable after remote compaction started failing. Even a tiny prompt like “are you there?” triggered:\n\n\n    {\n      \"error\": {\n        \"message\": \"Your input exceeds the context window of this model. Please adjust your input and try again.\",\n        \"type\": \"invalid_request_error\",\n        \"param\": \"input\",\n        \"code\": \"context_length_exceeded\"\n      }\n    }\n\n\nThe hard part was that once the thread reached this state, it could not produce its own handoff summary. That is exactly when the user most needs one.\n\nWhat appeared to cause the issue:\n\n  * The thread was long-running and project-heavy.\n  * It included many pasted screenshots/images over time.\n  * Older embedded image payloads were still present in the local history even though they were no longer useful.\n  * The compaction request itself appears to have become too large to fit through the context window.\n\n\n\nThe recovery pattern that worked:\n\n  1. Opened a second helper Codex thread.\n  2. Used it to inspect the local session history for the stuck thread.\n  3. Created a durable handoff summary as a fallback.\n  4. Backed up the original rollout/session file.\n  5. Replaced embedded image payloads with lightweight placeholders while preserving the message/tool structure.\n  6. Fully closed Codex so it would not rewrite cached state.\n  7. Replaced the original rollout path in place with the repaired image-stripped version.\n  8. Restarted Codex and reopened the original thread.\n\n\n\nAfter that, the original thread responded again and appeared intact.\n\nA few observed numbers from the case:\n\n  * Local rollout reduced from about 611 MB to about 52.7 MB.\n  * Embedded `data:image` payloads reduced to zero.\n  * JSON parse errors after repair: zero.\n  * The thread became usable again after restart.\n\n\n\nFeature ideas this suggests:\n\n  1. Codex could proactively detect when a long-running thread is approaching compaction failure risk and automatically write a durable handoff file before the thread becomes unusable.\n  2. Codex could expose a built-in “rescue session” workflow for stuck threads, especially one that strips stale images or oversized tool payloads while preserving text context.\n  3. When remote compaction fails because the compaction request itself is too large, Codex could explain that clearly and offer recovery paths: create handoff, strip media payloads, fork a repaired thread, or archive large assets.\n  4. A helper-thread repair workflow could become an official pattern: one Codex thread helps summarize, reduce, or repair another under user control.\n\n\n\nI also posted the more technical version as a GitHub issue here:\n\ngithub.com/openai/codex\n\n####  Codex Desktop: proactive handoff and rescue workflow for unrecoverable remote compaction failures\n\nopened 04:39PM - 22 May 26 UTC\n\n\n\n          Longtime76\n        \n\nenhancement  context  app  session\n\n### Summary A long-running Codex Desktop project thread became unusable after r…emote compaction started failing, even for tiny prompts such as \"are you there?\". The error was: ```json { \"error\": { \"message\": \"Your input exceeds the context window of this model. Please adjust your input and try again.\", \"type\": \"invalid_request_error\", \"param\": \"input\", \"code\": \"context_length_exceeded\" } } ``` This was especially painful because the stuck thread contained substantial project context. Once compaction failed, the thread could not produce its own handoff summary. ### Environment - Product: Codex Desktop - Platform: Windows - Model in thread metadata: gpt-5.5, xhigh reasoning - Local thread metadata showed Codex CLI/app metadata around 0.118.0-alpha.2, but the user had recently updated Codex before the failure - Thread type: long-running project thread with many pasted screenshots/images over time ### What happened The local rollout had grown very large because older prompts and tool outputs still contained embedded image data URLs from pasted screenshots. Those images were no longer needed for the active work, but they remained in the local thread history. When Codex attempted remote compaction, the compaction request itself appeared to exceed the model context window, leaving the thread unable to answer even a minimal prompt. ### Recovery approach that worked A second helper Codex thread was able to repair the broken thread locally: 1. Identified the local rollout for the stuck thread. 2. Created a durable handoff summary as a fallback. 3. Backed up the original rollout file. 4. Replaced embedded image payloads with lightweight placeholders while preserving message/tool structure. 5. Waited until Codex was fully closed so the app would not rewrite cached state. 6. Replaced the original rollout path in place with the repaired, image-stripped rollout. 7. Restarted Codex and opened the original thread. After this, the original thread responded successfully and appeared intact. Observed local numbers from this case: - Rollout reduced from about 611 MB to about 52.7 MB. - Embedded `data:image` payloads reduced to zero. - JSON parse errors after repair: zero. - Thread became usable again after restart. ### Why this matters The most valuable Codex threads are often the ones most likely to become long, media-heavy, and hard to replace. If compaction fails after a thread has crossed the point where it can answer, the user cannot ask that same thread to summarize itself or create a handoff. ### Feature requests 1. Proactively detect when a thread is approaching compaction failure risk and create a durable handoff before it becomes unusable. 2. Add a built-in rescue workflow for stuck sessions, especially one that can strip stale pasted images or oversized tool payloads while preserving text context. 3. When remote compaction fails because the compaction request itself is too large, explain that clearly and offer recovery options: create handoff, strip media payloads, fork a repaired thread, or archive large assets. 4. Provide a user-visible continuity/export mechanism for long-running project threads that does not depend on the current thread still being able to answer. 5. Consider a helper-thread pattern as an official workflow: another Codex session can inspect, summarize, and repair the local history of a stuck session under user control. ### Related issues This seems related to prior compaction/context reports such as: - #18572 - #10823 - #19386 - #4813 The additional contribution here is the recovery pattern: a helper Codex thread was able to back up and surgically reduce the local history, after which the original thread was recoverable.\n\nThe main reason I think this matters: the most valuable Codex threads are often the ones most likely to be long, media-heavy, and hard to replace. A graceful continuity/export/rescue path would prevent users from losing the exact threads they care about most.",
  "title": "Suggestion: proactive handoff and rescue workflow for Codex compaction failures"
}