External Publication
Visit Post

Would this concept model work?

Hugging Face Forums [Unofficial] April 7, 2026
Source
It’s an mdlm with ternary bit and hybrid q8 and q4 activation and 3 bit kvcache used with block diffusion. The training code is really messy so I don’t really want to share the pytoarch training code but I’m trying to train a 1b module with 40b training token.

Discussion in the ATmosphere

Loading comments...