[Tool] Open-source prompt compressor for LLMs – 22% avg savings with spaCy + rules
Hugging Face Forums [Unofficial]
May 19, 2026
Hi metawake,
Yes, I’m fighting the token-reduction fight, but coming at it from a
different angle. I just published a preprint measuring information loss
when LLMs summarize their own conversation history (curative compaction):
Preprint on LLM context compaction Research
> Hi HuggingFace community! Sharing a preprint that some of you might find interesting, on what LLMs forget when they compact their conversation history. Paper: “Lost in Compaction: Measuring Information Loss in LLM Context Summaries” DOI: lost in compaction Code, data, human-judge calibration: GitHub - profff/lost-in-compaction · GitHub Three findings that surprised me: In the compacted zone of a context, LLM recall drops to 0-7% even …
Your approach (preventive, prompt-level, rule-based) is orthogonal to mine
(curative, history-level, LLM-based). The two compose nicely: your tool
compresses individual messages on the way in, mine could compact the
accumulated history later. Worth chaining and measuring.
Two things in your design that I think are underappreciated and that I’d
love to discuss:
1. Compressing on the way in vs on the way out is a more important
distinction than the literature gives it credit for. Tool results,
chain-of-thought, and search outputs are typically 70-80% of the
verbose noise in a coding agent’s context. Compressing them before
they enter the context probably has more leverage than any sophisticated
history compaction strategy. Your tool is well-positioned for this.
2. Rule-based vs LLM-based compaction is a methodological lever I hadn’t
seriously considered until reading you. A rule-based compactor is
deterministic, which directly addresses a finding I made in my paper:
LLM-based compaction is non-deterministic at temperature zero,
producing run-to-run recall variance up to factor 14x on identical
conversations. A rule-based variant would remove that source of
variance entirely and make benchmarks much cleaner. If your tool
gives modest compression but stable behavior, that may be precisely
what you want for parts of an agent’s context (typed/structured
content especially).
Question for you: have you measured downstream task performance with vs
without your compressor? LLMLingua reports ~1.5% drop at 20x compression;
yours at 22% should land much lower, which would be a strong selling
point if measured.
In any case I’m including a follow-up section on input-side and rule-based
compression in my future work draft, partly inspired by your tool. Happy
to compare notes if you’re interested.
Cheers,
Olivier
Discussion in the ATmosphere