External Publication

🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now

Hugging Face Forums [Unofficial] June 29, 2026

@KnackAU I just realized we’re on the exact same wavelength! I’ve been meaning to circle back to something you mentioned in your earlier post — that idea about taking Gemma MTP heads and finetuning them for specific tasks like Python generation. That really stuck with me because I’ve actually been down a very similar rabbit hole, just from a slightly different angle.

A while back I took Google’s FunctionGemma-270M and fine-tuned it specifically for on-device mobile function calling — things like send_email(), create_contact(), create_calendar_event(), show_map(), the kind of everyday phone actions people do constantly. Used LoRA, trained it on a Colab T4 in about 30 minutes, and managed to push accuracy from ~58% (base model) to nearly 85%. The whole thing quantizes down to 272MB INT8 and runs in 1-3 seconds on a phone with zero cloud calls.

Here’s the repo if you’re curious: Mati83moni/functiongemma-270m-it-mobile-actions · Hugging Face

What caught my eye even more is your Aiden project — a physical device that watches the screen over HDMI and controls the phone via USB HID. That’s genuinely clever because it sidesteps all the jailbreak/root/ADB problems. And it made me think — something like my functiongemma could potentially serve as a lightweight brain for exactly that kind of setup. You have the hardware layer that sees and controls the phone, and this model could be the part that takes a natural language command like “send email to my boss with today’s report” and translates it into the right function call, all running locally on the device itself.

I see a lot of overlap between what we’re both chasing — small specialized models that actually do useful things on real hardware instead of burning $50/day on API calls to frontier models. Your head-grafting platform sounds fascinating too, especially the interceptor concept. Would love to hear more about how that works in practice.

Also completely agree with what you said about engaging with people doing similar things rather than trying to push ideas — that’s exactly the spirit. Different approaches to the same problem space, and I think we’re both finding that these small Gemma-class models are way more capable than people give them credit for when you train them right for a specific job.

Would be great to collaborate on something down the line

Discussion in the ATmosphere