🧠I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now
We do very much align in our thinking. I to used FunctionGemma for mobile, I also finetuned it for use in grammar correction, as a small model router, it is extremely useful and I love playing around with small focused models. I used Termux and Shizuku to get around most things. I agree with you that the Aiden project is a great idea, but..
I wish I could say that the Aiden project was mine. But that is actually NatalieY ’s project. I just discussed it with her on her post. I have considered similar projects myself and I have plenty of MCU’s to play with.
on the topic of the Gemma MTP heads. I have posted in my thread, but it is flagged and will probably remain so for days if things progress as they have previously. I will post some of it here for you and just let you know that it has progressed to fully working and I am now working on the Telepathy side which is model to different arch model latent space injection and I have that fully working as well. All reciepts, tests and code is on my repo under the mtp-draft-transcode branch.
The Latent Interceptor framework:
Draft body = the shared latent processor. The finetuned 4-layer draft, vocab head ripped off. It runs once per intercept, producing a 1024-d latent. Because there’s no 262k projection, it’s ~ms and CPU/Hexagon-pinnable (your <2 ms point holds — the body is tiny; the vocab matrix was the whole cost). A registry of specialized heads tapping that latent, each finetuned for a task, each staying in latent space:
- Action head (HID->A): the KAIROS NO_OP/KEEP/FORGET/E2B/ACTION gate.
- Memory head (HID->63-byte C2 Spinor): writes MEM-OKF directly from the latent — the curator’s ADMIT path, no tokenization.
- Tool head (HID->32-tool MCP logits): fires the harness decorator (E2B python for the strawberry-class problems) from a latent trigger.
Latent injection (return path): tool result → cyclotomic-ring residue → gemma4_kv_inject into the target KV ring. The model feels the result, never reads it.
So the heads are the routers; the body is the shared manifold they all read. One body pass, many latent destinations — that’s the framework, and it’s extensible to anything (the possibilities are, as they say, endless).
Pivot → Latent Interceptor → the draft repurposed as a latent-routing framework. Scaffold done: contract (shared body + action/memory/tool head registry), 5-action space grounded in the curator’s real ops, SP_LI_CAPTURE, the probe trainer, and a baseline that classifies the latent at 1.000 (mechanism proven — routing without tokenization is real).
The full closed loop works end to end:
Telepathy (latent→latent between models) — excellent endgame, one hard constraint. The right abstraction(LatentBridge{src,dst,adapter,dims,scale,basis,flags} + gemma4_kv_inject), and the framework just built is its substrate.
event: "count letters in strawberry"
TOOL HEAD fired (latent->tool id): PYTHON <- the latent routed to the right tool
tool ran -> result = "3" <- real python subprocess executed it
result injected into KV ring (6 tokens) <- return path, no re-prompt
model continues: "[tool name] count_letters [tool input] strawberry [tool output]"
Event → latent → Tool Head (PYTHON) → fire real tool → result “3” → return-path inject → model continues — no tokenizer in the decision or the return , only the final continuation is text. That’s the capstone: the complete latent-native agent loop in one call.
Discussion in the ATmosphere