{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiaq2iy33com4phshruxw23727kgsecvfhcyyg4h3r6fpgapqcehgy",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjl2ci4hklf2"
},
"path": "/t/kv-cache-for-llama-and-comfui/175293#post_1",
"publishedAt": "2026-04-15T22:35:58.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"github.com",
"GitHub - nihilistau/shannon-prime-comfyui",
"GitHub - nihilistau/shannon-prime-llama"
],
"textContent": "**Here you go: I am bored! Working KV-Cache system with 0 ppl loss I can’t remember how far it goes. But it can go much further! explore, use, share, enjoy. for CUDA,Vulkan,NEON,Adreno,Pytorch, I can’t remember what else. intergration with comfui and llama. The only reason for the license is because it is the Shannon limit and describes the transformer itself. I won’t be pursuing any money or stopping it’s use. I am seriously bored. I don’t normally complete projects. I guess i didnt complete this one either** … But it works, there is a model weights compressor included.. but I haven’t tested it yet.. But I won’t be doing any further work on this. So.. it should be pretty easy to see the quick wins and if you just follow the math it will lead to a 10D Torus. So expand it out to 10 bands and run a “spinor” transform and you hit 8x-10x umm.. whatelse… you can reduce the skeleton to 2bit.. ummm it’s univeral… you could reduce to 1bit… you caould create a predictor of where not to hit. I mean… there is lots you can do… so DO IT! ohh yeah.. you can reduce it and run entirely in cache.. AMD has 64mb… that can fit a decent model..\n\ngithub.com\n\n### GitHub - nihilistau/shannon-prime-comfyui\n\nContribute to nihilistau/shannon-prime-comfyui development by creating an account on GitHub.\n\ngithub.com\n\n### GitHub - nihilistau/shannon-prime-llama\n\nContribute to nihilistau/shannon-prime-llama development by creating an account on GitHub.\n\nnihilistau/shannon-prime",
"title": "KV cache for llama and comfui"
}