Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicx744euoi5z2nnxvmymln36j5buhbrzc7mnzay4ftz2w5r3shvfq",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mphm7da3kd62"
  },
  "path": "/t/i-built-a-novel-triple-hybrid-llm-mamba-attention-32-expert-moe-from-scratch-for-50-titan-v1-complete-titan-v2-first-cycle-done-expanding-dataset-now/177063#post_19",
  "publishedAt": "2026-06-29T22:29:53.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Aiden project",
    "NatalieY"
  ],
  "textContent": "We do very much align in our thinking. I to used FunctionGemma for mobile, I also finetuned it for use in grammar correction, as a small model router, it is extremely useful and I love playing around with small focused models. I used Termux and Shizuku to get around most things. I agree with you that the Aiden project is a great idea, but..\n\nI wish I could say that the Aiden project was mine. But that is actually **NatalieY** ’s project. I just discussed it with her on her post. I have considered similar projects myself and I have plenty of MCU’s to play with.\n\non the topic of the Gemma MTP heads. I have posted in my thread, but it is flagged and will probably remain so for days if things progress as they have previously. I will post some of it here for you and just let you know that it has progressed to fully working and I am now working on the Telepathy side which is model to different arch model latent space injection and I have that fully working as well. All reciepts, tests and code is on my repo under the mtp-draft-transcode branch.\n\n### **The Latent Interceptor framework:**\n\n**Draft body = the shared latent processor.** The finetuned 4-layer draft, vocab head ripped off. It runs once per intercept, producing a 1024-d latent. Because there’s no 262k projection, it’s ~ms and CPU/Hexagon-pinnable (your <2 ms point holds — the body is tiny; the vocab matrix was the whole cost).\n**A registry of specialized heads tapping that latent, each finetuned for a task, each staying in latent space:**\n\n  * Action head (HID->A): the KAIROS NO_OP/KEEP/FORGET/E2B/ACTION gate.\n  * Memory head (HID->63-byte C2 Spinor): writes MEM-OKF directly from the latent — the curator’s ADMIT path, no tokenization.\n  * Tool head (HID->32-tool MCP logits): fires the harness decorator (E2B python for the strawberry-class problems) from a latent trigger.\n\n\n\n**Latent injection (return path):** tool result → cyclotomic-ring residue → gemma4_kv_inject into the target KV ring. The model feels the result, never reads it.\n\nSo the heads are the routers; the body is the shared manifold they all read. One body pass, many latent destinations — that’s the framework, and it’s extensible to anything (the possibilities are, as they say, endless).\n\n**Pivot → Latent Interceptor** → the draft repurposed as a latent-routing framework. Scaffold done: contract (shared body + action/memory/tool head registry), 5-action space grounded in the curator’s real ops, `SP_LI_CAPTURE`, the probe trainer, and a **baseline that classifies the latent at 1.000** (mechanism proven — routing without tokenization is real).\n\n**The full closed loop works end to end:**\n\n**Telepathy (latent→latent between models) — excellent endgame, one hard constraint.** The right abstraction(`LatentBridge{src,dst,adapter,dims,scale,basis,flags}` + `gemma4_kv_inject`), and the framework just built _is_ its substrate.\n\n\n    event: \"count letters in strawberry\"\n    TOOL HEAD fired (latent->tool id): PYTHON          <- the latent routed to the right tool\n    tool ran -> result = \"3\"                            <- real python subprocess executed it\n    result injected into KV ring (6 tokens)             <- return path, no re-prompt\n    model continues: \"[tool name] count_letters [tool input] strawberry [tool output]\"\n\n\n\nEvent → latent → Tool Head (PYTHON) → fire real tool → result “3” → return-path inject → model continues — **no tokenizer in the decision or the return** , only the final continuation is text. That’s the capstone: the complete latent-native agent loop in one call.",
  "title": "🧠 I built a novel triple-hybrid LLM (Mamba + Attention + 32-expert MoE) from scratch for ~$50 — Titan v1 complete, Titan v2 first cycle done, expanding dataset now"
}