External Publication
Visit Post

[Tool] Open-source prompt compressor for LLMs – 22% avg savings with spaCy + rules

Hugging Face Forums [Unofficial] May 19, 2026
Source
Hi metawake, Yes, I’m fighting the token-reduction fight, but coming at it from a different angle. I just published a preprint measuring information loss when LLMs summarize their own conversation history (curative compaction): Preprint on LLM context compaction Research > Hi HuggingFace community! Sharing a preprint that some of you might find interesting, on what LLMs forget when they compact their conversation history. Paper: “Lost in Compaction: Measuring Information Loss in LLM Context Summaries” DOI: lost in compaction Code, data, human-judge calibration: GitHub - profff/lost-in-compaction · GitHub Three findings that surprised me: In the compacted zone of a context, LLM recall drops to 0-7% even … Your approach (preventive, prompt-level, rule-based) is orthogonal to mine (curative, history-level, LLM-based). The two compose nicely: your tool compresses individual messages on the way in, mine could compact the accumulated history later. Worth chaining and measuring. Two things in your design that I think are underappreciated and that I’d love to discuss: 1. Compressing on the way in vs on the way out is a more important distinction than the literature gives it credit for. Tool results, chain-of-thought, and search outputs are typically 70-80% of the verbose noise in a coding agent’s context. Compressing them before they enter the context probably has more leverage than any sophisticated history compaction strategy. Your tool is well-positioned for this. 2. Rule-based vs LLM-based compaction is a methodological lever I hadn’t seriously considered until reading you. A rule-based compactor is deterministic, which directly addresses a finding I made in my paper: LLM-based compaction is non-deterministic at temperature zero, producing run-to-run recall variance up to factor 14x on identical conversations. A rule-based variant would remove that source of variance entirely and make benchmarks much cleaner. If your tool gives modest compression but stable behavior, that may be precisely what you want for parts of an agent’s context (typed/structured content especially). Question for you: have you measured downstream task performance with vs without your compressor? LLMLingua reports ~1.5% drop at 20x compression; yours at 22% should land much lower, which would be a strong selling point if measured. In any case I’m including a follow-up section on input-side and rule-based compression in my future work draft, partly inspired by your tool. Happy to compare notes if you’re interested. Cheers, Olivier

Discussion in the ATmosphere

Loading comments...