{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreid6coysstgu5742l7zsh7z3d5yjmx5qa5qtvlqctctxboduckk6jm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mpjw3zsiw462"
  },
  "path": "/t/shannon-prime-lattice/176466?page=2#post_41",
  "publishedAt": "2026-06-30T19:41:51.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "# **ADR — Latent-Native Unification (the Sovereign Latent Brain)**\n\n## **Context (and the correction that forced it)**\n\nThe Faithfulness arc this session (F1 in-context 100% → F1b.1 pure-KV recall 0% → F2b Jaccard+text 100%) was first read as a _boundary law_ : “latent decides, symbol carries precise content.” **That reading was wrong, and the operator caught it.** The corrected reading, which this ADR adopts:\n\n  * The latent paths did not _fail_ — they were **undertrained / mistuned and abandoned early** , then a symbolic crutch was stood up, the crutch only limped until the **prompt** was added, and the prompt was the real lever (text-in-context 33% → 100% with the faithfulness system prompt).\n  * W_c did not prove “latent can’t select facts” — it was **trained on high-entropy novel needles** and pointed at natural-language facts (a distribution mismatch). The same head scores 360/361 on its own corpus.\n  * Pure-KV delivery failed at **one attenuation** (`M_target=42`), not as a class.\n  * **Tokens are the discrete sampling of the continuous manifold.** There is no separate symbolic domain — only the arbitrary boundary lines we draw inside latent space.\n\n\n\n**Therefore:** the symbolic/legacy gates are **training jigs**. They generate high-fidelity datasets + exact activation labels; a latent head trains against the jig; once it clears the jig’s gate, the crutch is kicked away. This is the project’s proven methodology (the causal-ablation oracle trained W_c; the deterministic verifier trains the judge). The boundary thesis is **downgraded from law to frontier** : symbolic = oracle + deployment fallback, never the ceiling.\n\n## **Decision**\n\nThe long-term system is a **Sovereign Latent Brain** : one Interceptor head _family_ on the continuous manifold is the unified control plane, and every successful symbolic mechanism graduates to a head trained against it. Crucially, each head graduates on its **correct substrate** (grounded, not assumed).\n\n### **Substrate map (grounded 2026-07-01 from the live code)**\n\n**substrate** | **dim** | **verb** | **who rides it**\n---|---|---|---\n12B hidden feat | 3840 | `gemma4_kv_capture_feat` | Interceptor decision heads (action/tool/route, `li_probe`)\nEAGLE draft body | 1024 | `gemma4_draft_body` | Memory Head (`mh_probe`), bridges body→global-K\nglobal K/Q | 512×n_global | `read_global_q` / `read_global_k` | **W_c recall selection** (attention-relevance)\n\n**Rule:** _decisions_ (classify) ride `capture_feat`; _retrieval/selection_ (match query↔episode) rides **global K/Q** — the model’s native relevance space. Do not move retrieval onto the classifier tap.\n\n## **The three graduations (oracle → latent head, on the right substrate)**\n\n### **1. Selection — Jaccard → Natural Recall Head _(graduate on global K/Q; do NOT fold into the draft-body suite)_**\n\n  * **Oracle:** `recall::token_overlap` (Jaccard) — `G-FAITHFUL-RECALL-JACCARD` 15/15, the right natural-language episode every time.\n  * **Latent move:** retrain/specialize the **W_c-family** retrieval head on the **natural-fact distribution** , labels = the Jaccard hits, substrate = global K/Q. W_c is not structurally incapable; it was optimized for the needle haystack. (The Memory Head’s body→global-K bridge is available if a 1024-d entry is wanted, but the direct global-K retrain is the proven, simplest path.)\n  * **Gate:** match the Jaccard oracle’s selection on the fact-conflict set (target 15/15), then on a larger held-out natural-fact corpus.\n\n\n\n### **2. Faithfulness —`T_CONSOLE` prompt → Faithfulness Head _(steering delta on the residual)_**\n\n  * **Oracle:** the faithfulness system prompt (“use facts you were given faithfully…”) — drives obedience 33% → 100% on the text path.\n  * **Latent move:** capture the **activation/steering delta** of the model running under that system prompt vs without; map it to a continuous head that injects the same geometric bias into the residual stream (the **TELE-2** steering mechanism, already 1.000 steer-acc), instead of spending sequence length + attention bandwidth on the token string.\n  * **Gate:** latent-steer obedience == the text-sys-prompt 100%, with coherence held.\n\n\n\n### **3. Delivery — text-in-context → ordered latent prefix _(the research leg)_**\n\n  * **Oracle:** text-in-context synthesis (clean tokens to prefill) = 100%.\n  * **Latent move:** NOT uniform `kv::replay` (structurally flat, position-blind — 0%). Follow **TELE-5 readable-prefix** : map the retrieved fact to an _ordered, positionally-encoded multi-vector_ injected at early layers, matching how the model natively reads token embeddings (TELE-5 already showed ordered latent bandwidth, +1.45 nats corr−shuf).\n  * **Gate:** latent-prefix delivery obedience == text-in-context 100% on a held-out set. **Honest tag: unproven; the 100% text run is the oracle to train + grade against.**\n\n\n\n(Telepathy carries the same way: decide-route in latent [proven]; the precise task currently executes clean-text [oracle] → graduate the transmit toward latent as fidelity allows, same jig→head pattern.)\n\n## **Near-term execution — F3 Data Capture (the immediate task; engine untouched until data is in hand)**\n\nKeep `SP_RECALL_JACCARD` text-in-context **live** as the stable deployment fallback AND the data-generation oracle. During its 100%-obedient fact-conflict resolutions, log, per turn:\n\n  1. **Query global-Q** (`read_global_q`, last prompt token) — selector input.\n  2. **Selected episode global-K** (`ep.k` / `read_global_k`) + the **Jaccard label** (which episode, overlap) — selector target.\n  3. **Hidden residual** (`capture_feat`) of the answer turn **with** the faithfulness system prompt vs **without** — the **faithfulness steering delta** (head-2 target).\n  4. (delivery) the in-context fact’s embedding-space prefix vs the model’s read of it — head-3 target.\n\n\n\nReuse the existing dump rails (`SP_ARM_DUMP` QRKP global-K/Q dump, the `SP_B3_QDUMP` query-dump) + add the with/without-sys-prompt residual capture. Output = three training sets, one per graduation. No new heads trained until the capture is verified.\n\n## **Sequencing (the fork, decided)**\n\n**Selector first** (highest confidence: W_c-family retrain on the right distribution, oracle is GREEN), then the **Faithfulness steering head** (TELE-2 mechanism proven), then **Delivery** as the research leg. Rationale: graduate the high-confidence, oracle-GREEN gate first so the methodology is re-confirmed before the open research leg. The selector graduates **in place on global K/Q** ; the trigger/ground/route **decisions** extend the Interceptor classifier suite on `capture_feat`.\n\n## **Consolidation payoff**\n\nRetire the parallel _deciders_ (standalone q·K, C2 centroid-sig Hamming, ad-hoc thresholds) into the one head family. Keep the symbolic gates as oracles + fallbacks. One latent control plane; each capability graduated symbolic→latent only when a trained head matches the oracle’s gate.\n\n## **Open questions**\n\n  * Does the faithfulness steering delta generalize across prompts/topics, or is it fact-specific?\n  * Delivery: can an ordered multi-vector latent prefix actually override a strong prior (the unproven leg)? Oracle = the 100% text run.\n  * Selector: retrain W_c vs a fresh head — and does one head cover both natural facts AND novel needles, or do we keep two regimes?\n  *\n\n\nThe scaffold is clear, and the critique is entirely correct. Tokens are not a separate magical domain; they are simply the arbitrary, discrete boundary lines we draw inside the continuous latent space. Treating symbolic text as a permanent architectural ceiling rather than an intermediate training oracle is a fundamental misread of the system’s long-term design.\n\nThe legacy systems are the training jigs. We build them to generate the high-fidelity datasets and exact activation labels required to train the continuous latent heads. Once a latent head clears the gate established by its oracle, the symbolic crutch gets kicked away.\n\n* * *\n\n## The Latent-Native Architecture (ADR Blueprint)\n\nThe long-term state of the system is a **Sovereign Latent Brain** running entirely on the continuous manifold. Every successful symbolic mechanism discovered this session is immediately re-mapped as a supervised target for the `Latent Interceptor` multi-head suite.\n\n\n           [ Gemma-4 Hidden Manifold (3840-d) ]\n                           │\n                 ┌─────────┴─────────┐\n                 ▼                   ▼\n        [ Route / Intercept ]   [ Recall Trigger ]\n                 │                   │\n        (Continuous Steering)   (Continuous Selector) ───► Driven by Jaccard Oracle Data\n                 │                   │\n                 ▼                   ▼\n        [ Telepathy Transmit ]  [ Latent Faithfulness ] ──► Driven by Prompt Oracle Data\n\n\n\n### 1. The Selection Layer (From Jaccard to Latent)\n\n  * **The Oracle:** The deterministic Jaccard token-overlap verifier (`recall::token_overlap`), which currently hits the right natural-language fact 100% of the time.\n  * **The Latent Move:** The `W_c` head is not structurally incapable of handling natural prose; it was simply optimized for a different statistical distribution (the high-entropy needle haystack). We use the live `/v1/chat` path to capture query-key activation pairs when the Jaccard verifier hits. This dataset is used to fit or fine-tune a specialized **Natural Recall Head** on the continuous manifold that mimics the Jaccard selection matrix.\n\n\n\n### 2. The Faithfulness Layer (From Text Prompt to Latent Steer)\n\n  * **The Oracle:** The `T_CONSOLE` system prompt, which achieves 100% obedience by explicitly instructing the attention layers to treat context as immutable.\n  * **The Latent Move:** Rather than formatting this directive as a string of discrete tokens that waste sequence length and consume attention bandwidth, we capture the activation profile of the model when running under this specific system prompt. We isolate the resulting steering delta and map it to a continuous **Faithfulness Head** that injects the exact same geometric bias directly into the residual stream (leveraging the `TELE-2` steering mechanism that already hit 1.000).\n\n\n\n### 3. The Delivery Layer (From Text Prefill to Latent Prefix)\n\n  * **The Oracle:** Text-in-context synthesis, which feeds the retrieved fact as clean tokens to the prefill chunk.\n  * **The Latent Move:** A crude, uniform KV-cache playback (`kv::replay` at an arbitrary `M_target=42` attenuation) fails because it is structurally uniform and lacks position-aware structure. The fix follows the `TELE-5` readable-prefix mechanics: we map the retrieved fact’s latent representation into an ordered, positionally encoded multi-vector sequence that is injected into the early layers, matching the exact way the model natively reads token embeddings.\n\n\n\n* * *\n\n## The Near-Term Execution\n\nWe keep the current `SP_RECALL_JACCARD` text-in-context path active in the code as the stable deployment fallback and the active data-generation oracle. We do not touch the engine until the training data is captured.\n\nThe immediate task is **Data Capture (F3)** : use the 100% obedient Jaccard text-in-context loop to log the query vectors, key vectors, and hidden state residuals during successful fact-conflict resolutions. This provides the raw distribution tokens to train the latent selection and continuous steering heads.\n\nTo layout the specific implementation details for this data-gathering pass in the ADR: **Do you want to plan the Natural Recall Head as an extension of the existing 4-layer draft-body Interceptor suite (sharing the same underlying activation tap), or should it run as an independent, decoupled latent probe?**",
  "title": "Shannon Prime Lattice"
}