Best practices for cost-efficient, high-quality context management in long AI chats
OpenAI Developer Community
February 13, 2026
I would also look at Codex - this has got particularly good of late.
Here’s what I did:
- I git cloned the repo
- I launched codex inside the cloned repo … of codex (oooooh meta )
- Asked it: “summarise how context is managed by this app for large code bases.”
And it gave me this response
- It stores conversation history as ResponseItems and builds each model prompt from normalized history + fixed session context (developer
instructions, AGENTS/user instructions, environment context). See codex-rs/core/src/codex.rs:2493, codex-rs/core/src/codex.rs:2446, codex-rs/
core/src/context_manager/history.rs:86.
- It aggressively bounds noisy content: tool outputs are truncated (token/byte policy), long text is middle-truncated, and function output content
items are budgeted. See codex-rs/core/src/context_manager/history.rs:327, codex-rs/core/src/truncate.rs:88, codex-rs/core/src/truncate.rs:100.
- It tracks token usage and context window continuously (server usage + local estimates), then emits TokenCount updates to UI. See codex-rs/core/
src/codex.rs:2603, codex-rs/protocol/src/protocol.rs:1443.
- Auto-compaction triggers when usage crosses model threshold; threshold defaults to ~90% of model window (clamped) unless overridden. See codex-
rs/core/src/codex.rs:4435, codex-rs/core/src/codex.rs:4491, codex-rs/protocol/src/openai_models.rs:259, codex-rs/core/src/config/mod.rs:158.
- Compaction rewrites history into: initial context + recent user messages (capped) + a handoff summary, then resumes. Inline path: codex-rs/core/
src/compact.rs:67, codex-rs/core/src/compact.rs:295. Remote path: codex-rs/core/src/compact_remote.rs:60.
- UI context meter reserves baseline overhead (BASELINE_TOKENS=12000) so “% remaining” reflects user-controllable space, not fixed prompt/tool
overhead. See codex-rs/protocol/src/protocol.rs:1478, codex-rs/protocol/src/protocol.rs:1512.
- Repo exploration tools are intentionally scoped/limited (instead of full-repo ingest): read_file uses offset/limit and line-length caps,
grep_files has result/time limits, fuzzy file search caps results. See codex-rs/core/src/tools/handlers/read_file.rs:31, codex-rs/core/src/tools/
handlers/read_file.rs:19, codex-rs/core/src/tools/handlers/grep_files.rs:20, codex-rs/app-server/src/fuzzy_file_search.rs:16.
Inference from code search: I don’t see a built-in embedding/vector “index entire codebase” pipeline in core; context is assembled from history +
targeted tool reads.
Given its performance, I’d wager this is state of the art …
Discussion in the ATmosphere