{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiaq2iy33com4phshruxw23727kgsecvfhcyyg4h3r6fpgapqcehgy",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjl2ci4hklf2"
  },
  "path": "/t/kv-cache-for-llama-and-comfui/175293#post_1",
  "publishedAt": "2026-04-15T22:35:58.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "github.com",
    "GitHub - nihilistau/shannon-prime-comfyui",
    "GitHub - nihilistau/shannon-prime-llama"
  ],
  "textContent": "**Here you go: I am bored! Working KV-Cache system with 0 ppl loss I can’t remember how far it goes. But it can go much further! explore, use, share, enjoy. for CUDA,Vulkan,NEON,Adreno,Pytorch, I can’t remember what else. intergration with comfui and llama. The only reason for the license is because it is the Shannon limit and describes the transformer itself. I won’t be pursuing any money or stopping it’s use. I am seriously bored. I don’t normally complete projects. I guess i didnt complete this one either** … But it works, there is a model weights compressor included.. but I haven’t tested it yet.. But I won’t be doing any further work on this. So.. it should be pretty easy to see the quick wins and if you just follow the math it will lead to a 10D Torus. So expand it out to 10 bands and run a “spinor” transform and you hit 8x-10x umm.. whatelse… you can reduce the skeleton to 2bit.. ummm it’s univeral… you could reduce to 1bit… you caould create a predictor of where not to hit. I mean… there is lots you can do… so DO IT! ohh yeah.. you can reduce it and run entirely in cache.. AMD has 64mb… that can fit a decent model..\n\ngithub.com\n\n### GitHub - nihilistau/shannon-prime-comfyui\n\nContribute to nihilistau/shannon-prime-comfyui development by creating an account on GitHub.\n\ngithub.com\n\n### GitHub - nihilistau/shannon-prime-llama\n\nContribute to nihilistau/shannon-prime-llama development by creating an account on GitHub.\n\nnihilistau/shannon-prime",
  "title": "KV cache for llama and comfui"
}