Would this concept model work?
Hugging Face Forums [Unofficial]
April 7, 2026
It’s an mdlm with ternary bit and hybrid q8 and q4 activation and 3 bit kvcache used with block diffusion. The training code is really messy so I don’t really want to share the pytoarch training code but I’m trying to train a 1b module with 40b training token.
Discussion in the ATmosphere