Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidirkppyrfsrkxdw7gcyak6ktlyl2tawwqalotagqe2nnsd6egopu",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mm7bem67ml42"
  },
  "path": "/t/tool-open-source-prompt-compressor-for-llms-22-avg-savings-with-spacy-rules/150483#post_3",
  "publishedAt": "2026-05-19T08:54:37.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Preprint on LLM context compaction",
    "Research",
    "lost in compaction",
    "GitHub - profff/lost-in-compaction · GitHub"
  ],
  "textContent": "Hi metawake,\n\nYes, I’m fighting the token-reduction fight, but coming at it from a\ndifferent angle. I just published a preprint measuring information loss\nwhen LLMs summarize their own conversation history (curative compaction):\n\nPreprint on LLM context compaction Research\n\n> Hi HuggingFace community!  Sharing a preprint that some of you might find interesting, on what LLMs forget when they compact their conversation history.  Paper: “Lost in Compaction: Measuring Information Loss in LLM Context Summaries”  DOI: lost in compaction  Code, data, human-judge calibration: GitHub - profff/lost-in-compaction · GitHub  Three findings that surprised me: In the compacted zone of a context, LLM recall drops to 0-7% even …\n\nYour approach (preventive, prompt-level, rule-based) is orthogonal to mine\n(curative, history-level, LLM-based). The two compose nicely: your tool\ncompresses individual messages on the way in, mine could compact the\naccumulated history later. Worth chaining and measuring.\n\nTwo things in your design that I think are underappreciated and that I’d\nlove to discuss:\n\n  1. Compressing on the way in vs on the way out is a more important\ndistinction than the literature gives it credit for. Tool results,\nchain-of-thought, and search outputs are typically 70-80% of the\nverbose noise in a coding agent’s context. Compressing them before\nthey enter the context probably has more leverage than any sophisticated\nhistory compaction strategy. Your tool is well-positioned for this.\n\n  2. Rule-based vs LLM-based compaction is a methodological lever I hadn’t\nseriously considered until reading you. A rule-based compactor is\ndeterministic, which directly addresses a finding I made in my paper:\nLLM-based compaction is non-deterministic at temperature zero,\nproducing run-to-run recall variance up to factor 14x on identical\nconversations. A rule-based variant would remove that source of\nvariance entirely and make benchmarks much cleaner. If your tool\ngives modest compression but stable behavior, that may be precisely\nwhat you want for parts of an agent’s context (typed/structured\ncontent especially).\n\n\n\n\nQuestion for you: have you measured downstream task performance with vs\nwithout your compressor? LLMLingua reports ~1.5% drop at 20x compression;\nyours at 22% should land much lower, which would be a strong selling\npoint if measured.\n\nIn any case I’m including a follow-up section on input-side and rule-based\ncompression in my future work draft, partly inspired by your tool. Happy\nto compare notes if you’re interested.\n\nCheers,\nOlivier",
  "title": "[Tool] Open-source prompt compressor for LLMs – 22% avg savings with spaCy + rules"
}