External Publication

Visit Post

Unexpected Codex 5h Quota Exhaustion on Pro 5X + GPT-5.3 Codex Spark Context Window Failure

OpenAI Developer Community June 8, 2026

Source

Environment

ChatGPT Pro 5X
Codex Desktop (macOS)
Same account/workspace across Desktop, Web, and CLI
Default service tier
Primary model during the affected work: GPT-5.5 (xhigh reasoning)
Later diagnostic run: GPT-5.3 Codex Spark (xhigh reasoning)

Issue A: Unexpected 5h Quota Exhaustion

Today I unexpectedly exhausted my Codex 5-hour quota after what appeared to be a very small amount of work.

From the user perspective:

Only 3 active threads were used.
Total active task runtime was under 20 minutes.
Codex analytics showed very low daily thread activity
No unusually large code generation jobs were performed.

I’ve never hit the wall with much more intensive work. I did not expect this workload to come anywhere close to exhausting the quota window.

After investigating local session logs, I found that one GPT-5.5 thread accumulated extremely large token counts.

Thread:

019ea500-1d8b-7c90-881a-bded967f5aa9

Run 1

GPT-5.5 (xhigh)
Duration: 157 seconds
Reported primary usage: 15%
Total tokens by end of run: 777,268

Run 2

GPT-5.5 (xhigh)
Duration: 497 seconds
Reported primary usage: 38%
Total tokens by end of run: 5,290,376

At this point the thread had accumulated more than 5 million total tokens.

However, before the next task, the account had already reached the 5-hour quota limit.

Why This Is Confusing

The visible workload appeared very small:

3 threads
Less than 20 minutes of active runtime
No large-scale generation tasks

Yet backend token accumulation appears to have reached multi-million-token levels.

What is not clear is how the following relate to one another:

Total tokens
Cached input tokens
Primary usage %
5-hour quota consumption
Pro 5X allowance

In particular:

One run ended at approximately 5.29M total tokens while reporting only 38% primary usage.
Before the next investigation run, the account was already at 100%.

I would appreciate clarification on:

How is “primary %” calculated?
Does “primary %” scale according to subscription tier?
Does 38% mean 38% of the Pro 5X allowance?
How much do cached input tokens contribute to quota consumption?
Is total token accumulation directly related to the 5-hour quota?
Are there known discrepancies between visible usage indicators and backend quota accounting?

Issue B: GPT-5.3 Codex Spark Context Window Failure

After the quota issue occurred, I switched to GPT-5.3 Codex Spark to investigate the problem.

The task was extremely simple:

“Can you figure out why we suddenly hit the 5h usage limit of Codex?”

Spark performed a few searches and inspections, then produced:

Context automatically compacted

followed immediately by:

Your input exceeds the context window of this model. Please adjust your input and try again.

No meaningful analysis was completed before the context window was exhausted.

Notably, this happened while attempting to diagnose the quota issue itself.

Why This Seems Strange

The sequence was roughly:

Open a repository/workspace.
Ask Spark to investigate a quota issue.
Spark performs a handful of searches.
Context compaction triggers.
Context window is exceeded.
Task aborts without completing.

This was not a large coding task.

It was primarily repository inspection and log analysis.

Questions About Spark

Is GPT-5.3 Codex Spark intended for repository-scale investigations?
Is Spark expected to automatically compact context successfully during repo analysis?
Are there recommended limits for:
- AGENTS.md size
- memory files
- operational notes
- workspace documentation
- session history
Are there known issues where Spark repeatedly re-reads large workspace documents and rapidly consumes context?
Is there a recommended workflow for using Spark as a troubleshooting or repository-investigation agent?

Additional Context

This workspace contains:

agent skills
operational memory files
workspace documentation
automation notes
agent-generated reports

It is possible that Spark is encountering a context-management edge case in repositories that contain large amounts of operational memory and documentation.

If Spark is not intended for this type of investigation, guidance on its expected scope would be very helpful.