External Publication
Visit Post

🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now

Hugging Face Forums [Unofficial] June 28, 2026
Source
All the examples you cite interest me and are some of the things I have been thinking about. I have this idea in my head, floating around to, something along the lines of taking Gemma 4 MTP heads and finetuning or training to create only python for example. I have created a platform currently that grafts the heads on to the model but uses a custom driver and passes them through an interceptor. It basically allows you to use them as small “individual” models that are then injected back into latent space. I have many, many things I plan to try using this setup. for example creating small python to do actual real calculation or compute, tool calls, The possabilities are endless really. I have previous systems that I exclusively use small finetuned models for everything combined with a small finetuned router model that sends requests off to the right model. I have used function gemma 270m and qwen3 0.5b. I finetune them for specific tasks such as grammar, conversational flavour, or looking for missing ] } ; brackets, (common mistakes etc that small local models drop alot during coding) There is quite alot you can do with finetung/training tiny models and they work extremly well and extremly fast. Anything that I do in this area I will be posting on this board somewhere.

Discussion in the ATmosphere

Loading comments...