Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihzw72ljsq2giqjoeffzh5alaqkpw3ghdbmwl5jyksfhnrd77mlmy",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mpehjsn76572"
  },
  "path": "/t/i-built-a-novel-triple-hybrid-llm-mamba-attention-32-expert-moe-from-scratch-for-50-titan-v1-complete-titan-v2-first-cycle-done-expanding-dataset-now/177063#post_8",
  "publishedAt": "2026-06-28T17:02:42.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "All the examples you cite interest me and are some of the things I have been thinking about. I have this idea in my head, floating around to, something along the lines of taking Gemma 4 MTP heads and finetuning or training to create only python for example.\n\nI have created a platform currently that grafts the heads on to the model but uses a custom driver and passes them through an interceptor. It basically allows you to use them as small “individual” models that are then injected back into latent space.\n\nI have many, many things I plan to try using this setup. for example creating small python to do actual real calculation or compute, tool calls, The possabilities are endless really.\n\nI have previous systems that I exclusively use small finetuned models for everything combined with a small finetuned router model that sends requests off to the right model. I have used function gemma 270m and qwen3 0.5b. I finetune them for specific tasks such as grammar, conversational flavour, or looking for missing ] } ; brackets, (common mistakes etc that small local models drop alot during coding)\n\nThere is quite alot you can do with finetung/training tiny models and they work extremly well and extremly fast. Anything that I do in this area I will be posting on this board somewhere.",
  "title": "🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now"
}