Unexpected Codex 5h Quota Exhaustion on Pro 5X + GPT-5.3 Codex Spark Context Window Failure
Environment
- ChatGPT Pro 5X
- Codex Desktop (macOS)
- Same account/workspace across Desktop, Web, and CLI
- Default service tier
- Primary model during the affected work: GPT-5.5 (xhigh reasoning)
- Later diagnostic run: GPT-5.3 Codex Spark (xhigh reasoning)
Issue A: Unexpected 5h Quota Exhaustion
Today I unexpectedly exhausted my Codex 5-hour quota after what appeared to be a very small amount of work.
From the user perspective:
- Only 3 active threads were used.
- Total active task runtime was under 20 minutes.
- Codex analytics showed very low daily thread activity
- No unusually large code generation jobs were performed.
I’ve never hit the wall with much more intensive work. I did not expect this workload to come anywhere close to exhausting the quota window.
After investigating local session logs, I found that one GPT-5.5 thread accumulated extremely large token counts.
Thread:
019ea500-1d8b-7c90-881a-bded967f5aa9
Run 1
- GPT-5.5 (xhigh)
- Duration: 157 seconds
- Reported primary usage: 15%
- Total tokens by end of run: 777,268
Run 2
- GPT-5.5 (xhigh)
- Duration: 497 seconds
- Reported primary usage: 38%
- Total tokens by end of run: 5,290,376
At this point the thread had accumulated more than 5 million total tokens.
However, before the next task, the account had already reached the 5-hour quota limit.
Why This Is Confusing
The visible workload appeared very small:
- 3 threads
- Less than 20 minutes of active runtime
- No large-scale generation tasks
Yet backend token accumulation appears to have reached multi-million-token levels.
What is not clear is how the following relate to one another:
- Total tokens
- Cached input tokens
- Primary usage %
- 5-hour quota consumption
- Pro 5X allowance
In particular:
- One run ended at approximately 5.29M total tokens while reporting only 38% primary usage.
- Before the next investigation run, the account was already at 100%.
I would appreciate clarification on:
- How is “primary %” calculated?
- Does “primary %” scale according to subscription tier?
- Does 38% mean 38% of the Pro 5X allowance?
- How much do cached input tokens contribute to quota consumption?
- Is total token accumulation directly related to the 5-hour quota?
- Are there known discrepancies between visible usage indicators and backend quota accounting?
Issue B: GPT-5.3 Codex Spark Context Window Failure
After the quota issue occurred, I switched to GPT-5.3 Codex Spark to investigate the problem.
The task was extremely simple:
“Can you figure out why we suddenly hit the 5h usage limit of Codex?”
Spark performed a few searches and inspections, then produced:
Context automatically compacted
followed immediately by:
Your input exceeds the context window of this model. Please adjust your input and try again.
No meaningful analysis was completed before the context window was exhausted.
Notably, this happened while attempting to diagnose the quota issue itself.
Why This Seems Strange
The sequence was roughly:
- Open a repository/workspace.
- Ask Spark to investigate a quota issue.
- Spark performs a handful of searches.
- Context compaction triggers.
- Context window is exceeded.
- Task aborts without completing.
This was not a large coding task.
It was primarily repository inspection and log analysis.
Questions About Spark
- Is GPT-5.3 Codex Spark intended for repository-scale investigations?
- Is Spark expected to automatically compact context successfully during repo analysis?
- Are there recommended limits for:
- AGENTS.md size
- memory files
- operational notes
- workspace documentation
- session history
- Are there known issues where Spark repeatedly re-reads large workspace documents and rapidly consumes context?
- Is there a recommended workflow for using Spark as a troubleshooting or repository-investigation agent?
Additional Context
This workspace contains:
- agent skills
- operational memory files
- workspace documentation
- automation notes
- agent-generated reports
It is possible that Spark is encountering a context-management edge case in repositories that contain large amounts of operational memory and documentation.
If Spark is not intended for this type of investigation, guidance on its expected scope would be very helpful.
Discussion in the ATmosphere