Best practices for cost-efficient, high-quality context management in long AI chats

OpenAI Developer Community February 13, 2026

Source

I would also look at Codex - this has got particularly good of late.

Here’s what I did:

I git cloned the repo
I launched codex inside the cloned repo … of codex (oooooh meta )
Asked it: “summarise how context is managed by this app for large code bases.”

And it gave me this response

  - It stores conversation history as ResponseItems and builds each model prompt from normalized history + fixed session context (developer
    instructions, AGENTS/user instructions, environment context). See codex-rs/core/src/codex.rs:2493, codex-rs/core/src/codex.rs:2446, codex-rs/
    core/src/context_manager/history.rs:86.
  - It aggressively bounds noisy content: tool outputs are truncated (token/byte policy), long text is middle-truncated, and function output content
    items are budgeted. See codex-rs/core/src/context_manager/history.rs:327, codex-rs/core/src/truncate.rs:88, codex-rs/core/src/truncate.rs:100.
  - It tracks token usage and context window continuously (server usage + local estimates), then emits TokenCount updates to UI. See codex-rs/core/
    src/codex.rs:2603, codex-rs/protocol/src/protocol.rs:1443.
  - Auto-compaction triggers when usage crosses model threshold; threshold defaults to ~90% of model window (clamped) unless overridden. See codex-
    rs/core/src/codex.rs:4435, codex-rs/core/src/codex.rs:4491, codex-rs/protocol/src/openai_models.rs:259, codex-rs/core/src/config/mod.rs:158.
  - Compaction rewrites history into: initial context + recent user messages (capped) + a handoff summary, then resumes. Inline path: codex-rs/core/
    src/compact.rs:67, codex-rs/core/src/compact.rs:295. Remote path: codex-rs/core/src/compact_remote.rs:60.
  - UI context meter reserves baseline overhead (BASELINE_TOKENS=12000) so “% remaining” reflects user-controllable space, not fixed prompt/tool
    overhead. See codex-rs/protocol/src/protocol.rs:1478, codex-rs/protocol/src/protocol.rs:1512.
  - Repo exploration tools are intentionally scoped/limited (instead of full-repo ingest): read_file uses offset/limit and line-length caps,
    grep_files has result/time limits, fuzzy file search caps results. See codex-rs/core/src/tools/handlers/read_file.rs:31, codex-rs/core/src/tools/
    handlers/read_file.rs:19, codex-rs/core/src/tools/handlers/grep_files.rs:20, codex-rs/app-server/src/fuzzy_file_search.rs:16.

  Inference from code search: I don’t see a built-in embedding/vector “index entire codebase” pipeline in core; context is assembled from history +
  targeted tool reads.

Given its performance, I’d wager this is state of the art …

Discussion in the ATmosphere