{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiehqtz5r6zgvjj622n34wqiv4qdjrbwd3tjeaskzu2gjbq2oisgqa",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mokm6ewepcf2"
},
"path": "/t/shannon-prime-lattice/176466?page=2#post_26",
"publishedAt": "2026-06-18T09:06:49.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"@400"
],
"textContent": "# RFC-XBAR — The Auditable Latent Crossbar (Exec + Memo sharing Ring 2)\n\n**Status:** **v1.1** (Ring 3 consolidated tier added 2026-06-09, §3.1); **v1** (consolidated 2026-06-09 on P1/P2.a POC data; v0 brainstorm formalized 2026-06-07, KnackAU + Gemini + Claude). v1 deltas: §5 roadmap rewritten on measured physics (P1 CLOSED ledger X-R1; P2.a CLOSED; PLE-stall theory formally corrected in CONTRACT-XBAR-P2); P2.b reframed as the span-compression adapter (CONTRACT-XBAR-P2b) — **the convergence point where the injector, the curator’s compaction organ, the modality template and NIGHTSHIFT’s worker become one trained component** ; C1 split into C1-lite (qwen3 CPU ring, exists today) and C1-full (post-P3); §7 NIGHTSHIFT/Optane persistence design added; §8 endgame risk register added.\n**Parents:** RFC-001 (discrete substrate doctrine) · CONTRACT-C2 (ARM Spinor-KV two-ring) · Phase 4-MeMo (M.0 stub / M.1 dual-load / M.2 zero-copy dialogue loop) · Heterogeneous-SoC manifesto (tricks #1, #4, #7, #9).\n**One line:** two models — the Executive (Exec, M_{gen}) and the Memory curator (Memo, M_{manage}) — share Ring 2 of the cyclotomic memory and communicate through **latent state, not tokens** , with every write receipt-backed, gated, and rewindable. The industry strings agents together through the tokenizer; we hand them the same VRAM.\n\n* * *\n\n## 0. The name\n\n**XBAR** — the Auditable Latent Crossbar. “Crossbar” because it is a switching fabric between models’ latent states; “auditable” because that is the one word no floating-point agent stack can claim, and it is the entire reason this belongs in the lattice rather than in a PyTorch script. Lanes are XBAR-P (probe / physics), XBAR-C (curator), XBAR-M (modality lanes), XBAR-N (NIGHTSHIFT — idle-time consolidation).\n\n## 1. The thesis\n\nMulti-agent systems today communicate by detokenizing model A’s state into text and retokenizing it for model B. The boundary is lossy, slow, and discards everything the residual stream knew that the argmax threw away. XBAR bypasses the boundary: Exec writes its episodic state into Ring 2 as it generates; Memo — a small, differently-trained curator — reads the ring **non-causally** (it is offline; it sees the whole episode at once), consolidates it, and writes back. Exec’s next attention sweep picks up the curated state as native memory. No prompt, no tokens, no copy.\n\nThe same mechanism is the multimodal door: any encoder whose output is adapter-aligned to Exec’s residual geometry (we already hold the audio path — voxtral-tts.c / VHT2 / Voxtral-4B) can deposit pseudo-token state into the ring. Not one-shot prompt multimodality — **persistent, curated, episodic multimodal memory.**\n\n## 2. Why this is ours to build (the primitives already exist)\n\nPrimitive | Status | Role in XBAR\n---|---|---\nExec + Memo concurrent forward | **PROVEN** — M.1 dual-model cDSP, 1.796×, one `Arc<FastRpcSession>`; scheduler is model-agnostic | the two models already run together sharing the _scheduler_ ; XBAR’s delta is sharing _state_\nMemo artifact | **EXISTS** — M.0 stub (`qwen25-coder-0.5b-memory.sp-model`, sha-pinned) | starting body for the curator; XBAR-C retrains/replaces it\nTwo-ring cyclotomic memory | **IN MATH-CORE** — core/arm wired into `qwen3_generate_kv`, T_ARM_GENKV green | Ring 2 is the shared medium (NOTE: wired on the qwen3 CPU path; gemma4-CUDA ring wiring is XBAR-P3, not P1)\nSpinor block ABI | **EXISTS** — 63 B + 0xA5 sentinel = 1 cache line; Frobenius-lift bit-identity as integrity receipt | the inter-model message format; a written block is _provably_ well-formed\nTransactional rewind | **EXISTS** — `sp_session_clone`/`rewind` (MTP work) | Memo operates on a clone; bad consolidation rolls back; canonical episode never corrupted\nAccept/reject machinery | **EXISTS** — MTP draft→verify→byte-exact accept | repointed: gate whether an injected/consolidated memory is _promoted_ into the canonical ring\n~120× Spinor KV compression | **MEASURED** (C2) | the cost structure: Memo re-reads a few hundred blocks, not 32k tokens — the curator is affordable\nOwned VRAM arena (no GC) | **SHIPPED** — engine CUDA arena, OK_Q4B kernels | the zero-copy pointer handoff is real, not aspirational\nCRT residue lanes | **DOCTRINE** — manifesto tricks #4/#9 | one prime lane per modality: audio and text blocks CRT-separable, never alias, provenance recoverable\n\n## 3. Architecture\n\n\n ┌─────────────────────────── VRAM (owned arena) ───────────────────────────┐\n │ │\n │ Exec (gemma-4-12B, OK_Q4B) Memo (small curator, frozen-small) │\n │ causal forward, generates non-causal pass over the episode │\n │ │ ▲ │ ▲ │\n │ ▼ write │ attend ▼ propose │ read │\n │ ┌─ Ring 1 ─┐ ┌── Ring 2 (hippocampus) ┐ ┌─ Ring 2′ (shadow) ─┐ │\n │ │ working │ │ verbatim Spinor KV, │◄─│ Memo's proposals │ │\n │ │ KV │ │ recent + bounded │ │ promote-on-accept │ │\n │ └──────────┘ └────────────────────────┘ └─────────┬──────────┘ │\n │ ▲ recall from BOTH │ promote (gated) │\n │ │ ┌── Ring 3 (neocortex) ───┐◄───┘ │\n │ └────────────────│ adapter pseudo-tokens, │ G-R3-LOSS bounded │\n │ │ consolidated long-term │ (irreversible) │\n │ └─────────────────────────┘ │\n │ modality lanes (CRT prime per modality): │\n │ audio adapter (voxtral), video, ... │\n └──────────────────────────────────────────────────────────────────────────┘\n Ring 2′ promotions: coherence/PPL delta → accept or REWIND (transient, reversible).\n Ring 3 promotions: G-R3-LOSS bounded BEFORE source eviction (permanent, irreversible).\n\n\nDesign rules (settled in the 2026-06-07 brainstorm):\n\n 1. **Memo is small.** It sorts latents, it does not speak. A few layers / low-rank operator / 0.5B-class body co-resides permanently with Exec inside 12 GB — no weight-swap latency. (Two 12Bs on a 2060 is a non-starter; it’s also unnecessary.)\n 2. **“Backwards” = non-causal.** Exec is causal (past→future). Memo runs bidirectionally over the whole stored episode and rewrites it globally. That is the architectural form of consolidation (“sleep/replay”), not a vague autoencoder.\n 3. **Shadow ring, promote-on-accept.** Memo never writes the canonical Ring 2 directly. Proposals land in Ring 2′; a cheap downstream coherence gate (PPL delta over the post-injection window) accepts → promote with receipt, or rejects → rewind. The canonical episode stays clean and every promotion is auditable.\n 4. **Geometry is the law.** A ring entry is a per-layer, per-head (K,V) at a position — roped, normed, V-less where the architecture says so (gemma4 globals: V = raw K projection, weightless-RMS-normed, never roped). Nothing enters the ring that does not honor the coordinates. XBAR-P1 exists to _measure_ how strict this law is.\n 5. **One CRT prime per modality lane.** Audio/text/video blocks are residue-separable in the same unified ring; Exec attends to one memory, provenance stays recoverable, lanes can never alias. (Manifesto tricks #4 + #9, applied to modality instead of hardware channel.)\n\n\n\n## 3.1 The memory hierarchy — Ring 3, the consolidated tier (v1.1 amendment, 2026-06-09)\n\nv1 carried two rings + a shadow. P2.b’s reframing (the adapter as Memo’s compaction organ) and the C2.4 finding (raw recall degrades past ~16× selection budget) together force a distinction we had been conflating: **a transient staging buffer and a permanent consolidated store are different objects.** Naming them separately yields a four-tier hierarchy that maps cleanly onto the brain’s memory consolidation:\n\nTier | Substrate | Representation | Lifetime | Biological analogue\n---|---|---|---|---\n**Ring 1** | RAM working window | verbatim KV, full attention | the live turn | sensory / working memory\n**Ring 2** | Optane raw episodic store | verbatim Spinor KV blocks | recent episode (bounded) | **hippocampus** — recent, detailed, lossless\n**Ring 2′** (shadow) | transient staging copy of Ring 2/3 | proposals awaiting the gate | one consolidation pass | (no analogue — it’s the _audit_ mechanism)\n**Ring 3** | Optane consolidated store | **P2.b-adapter pseudo-tokens** (n→k gist) | long-term | **neocortex** — old, dense, semantic\n\n**The transfer-and-transform rule (what NIGHTSHIFT actually does).** Sleep does not just tidy the hippocampus; it _replays raw episodes and writes compressed semantic traces to neocortex._ NIGHTSHIFT does the same: it reads aging Ring 2 episodes, runs the P2.b adapter to compress n verbatim positions into k pseudo-tokens, proposes those to Ring 2′, and on gate-accept **promotes them to Ring 3** (and may then evict the now-redundant raw Ring 2 positions under the same receipt). Ring 3 is therefore populated _exclusively_ by the adapter — it is the curator’s compaction organ writing to its long-term destination.\n\n**Recall-from-both (the Executive’s new query path).** Exec no longer recalls from a single growing list. Per step it queries **Ring 2 for verbatim recent detail** and **Ring 3 for dense long-ago grounding** , and attends over the union. This is why Ring 3 _resolves_ the C2.4 ceiling rather than re-hitting it: raw Ring 2 stays **bounded and recent** (where the selection budget is favorable — the NIAH ladder was clean through ~8k), and the long tail lives in Ring 3 as compact gist whose effective selection budget is k-per-episode, not n-per-token. You stop asking the raw router to do 64× selection over 32k verbatim positions — the regime where it broke.\n\n**Honest negatives (operator-specified, on the board permanently):**\n\n 1. **Double-recall cost.** The router must now score candidates across _both_ Ring 2 and Ring 3 every step — two stores, two index scans, a merged top-k. The ±1 projection sidecar already supports an arbitrary candidate set, so the mechanism composes; the _cost_ is ~2× the routing scan plus a fetch from two physical stores (the C2.2 split-device `read_batch2` overlap pattern applies directly). Gate **G-R3-DUALROUTE** : dual-store recall reproduces single-store recall when Ring 3 is empty (parity null), and the added scan cost is measured, not assumed.\n\n 2. **The consolidation-loss gate (irreversible).** Compressing n raw tokens into k pseudo-tokens _discards detail by construction_ — and unlike eviction (which the operator can refuse), a promoted Ring 3 block has thrown its source away. So the loss must be **quantified and bounded before promotion, permanently.** Gate **G-R3-LOSS** : for each candidate consolidation, measure the recoverable-information delta — PPL of a held-out continuation that _depended on the raw span_ under {raw Ring 2} vs {Ring 3 gist}, plus a NIAH-style fact-survival probe on facts inside the compressed span. Promote only if the loss is within a pinned budget; otherwise the span stays verbatim in Ring 2 (some episodes are not compressible without unacceptable loss — that is a valid, logged outcome, not a failure). This gate is **load-bearing and irreversible-aware** : a bad Ring 3 promotion cannot be rewound the way a Ring 2′ proposal can, because the raw source is gone — so the gate runs _before_ the source is evicted, and the eviction is part of the same receipt or does not happen.\n\n 3. **Ring 3 is the §4 risk surface, doubled.** Ring 3 blocks are adapter-_generated_ , not model-_minted_ — they are exactly the “semantically-wrong-but-valid” objects §4 warns about, now made permanent. The discrete substrate proves a Ring 3 block is _well-formed_ (sentinel, lift identity); only G-R3-LOSS proves it is _faithful_. The coherence gate is therefore not optional on the Ring 3 path — it is the only thing standing between “consolidated memory” and “confidently fabricated history.”\n\n\n\n\n**Lane ownership:** Ring 2 verbatim store + cold-evict = **C1-lite** (heuristic, today, no adapter). Ring 3 consolidation = **C2** (the P2.b adapter) under G-R3-LOSS. NIGHTSHIFT = the offline loop that drives Ring 2 → (adapter) → Ring 2′ → (gate) → Ring 3. The C1-lite persistence format (episode = {K store, V store, manifest}) is the substrate both Ring 2 and Ring 3 serialize into; Ring 3 just carries pseudo-token blocks instead of verbatim KV.\n\n**BACKLOG — Ring-3 provenance tag (the “encoding gap”, banked from Ye 2606.05605, §6.1).** A consolidated Ring-3 gist is the model’s _own_ compressed memory, but Exec reads it as if it were raw context — the “encoding gap” (it does not know it is reading a memory). Solution, in the Shannon-Prime idiom — NOT a learned fp32 bias vector (that injects un-auditable continuous state, the exact failure mode the §6.2 doctrine forbids): a **discrete CRT/sentinel provenance lane** — a residue tag (design-rule §3-5: one CRT prime per lane) + the Spinor `0xA5` sentinel marking a block as “Ring-3 consolidated, not Ring-2 verbatim”, recoverable and receipt-backed. Gate **G-R3-PROV (deferred, builds on R3):** an _agency-gain-style_ test — Exec’s held-out PPL on a continuation that depends on the gist, _with_ the provenance tag vs _without_ it; promote the tag only if `Δppl < 0` (the model uses the provenance signal). Strictly an enhancer (Ye’s own ablation ranks proprioception secondary), so this is a post-R3 refinement, never bundled into P2.b’s first training run.\n\n## 3.2 Audio (M1) — the synthesis/output path (design note + verified constraints, 2026-06-09)\n\n§3 already holds M1’s **input** side: an audio encoder adapter-aligned to Exec’s residual geometry deposits pseudo-token state into the ring (the `SP_XBAR_EMB` injection interface, P2.a, already built). This note scopes the **output** side (text/latent → speech) that an operator proposal raised, and records what’s decided vs deferred. **Status: design exploration, NOT a committed build. No ledger row (the ledger is for green-gate results; this is a plan).**\n\n**Direction adopted:** non-autoregressive vocoding with FiLM conditioning. AR token-by-token synthesis is O(N·D) in frames×codebook-depth; a parallel CNN vocoder is O(1) per chunk — the correct latency lever. FiLM (`γ(s)·x + β(s)`) is element-wise scale/shift, so it quantizes to fixed-point cleanly and avoids attention’s dynamic-range pain — the one part of the proposal that is both correct and SP-native.\n\n**The GNA is a deliberate target — corrected read (2026-06-09; the earlier “GNA is dead” verdict was wrong).** OpenVINO dropping the GNA _plugin_ at 2024.0 is a vendor product decision, not a statement about the silicon — conflating the two was the exact anti-pattern (treating an abandonment headline as a physical limit) we reject. Ground truth (verified): the open `intel/gna` library — `gna-api` core + **XNN neural-network kernels** + GMM kernels + samples, LGPL-2.1, archived read-only @ v3.0.0 but **fully present** — drives the GNA 2.0 on this host (11th-gen, GNA 2.0 in silicon) directly via Rust FFI, AND ships a **CPU software-execution mode** for driver-free graph prototyping. For a build-it-ourselves fixed-point pipeline, vendor abandonment is a **feature** : a frozen, open, no-lock-in, uncontested target. GNA was designed for “always-on AI speech/audio (neural noise cancellation)” — so the **input** side (VAD / mel / noise-suppression / tiny feature CNN) is its home turf and the natural first target. The **output** side (a fixed-point FiLM CNN vocoder on GNA) is an open **capacity-envelope question, not a no** : measure GNA 2.0’s XNN op set, INT8/16 precision, memory and throughput (cheaply — SW-emulation mode + samples first), then **co-design a minimal vocoder to fit the envelope** rather than porting a big one or assuming it can’t hold one. A GNA backend is the lattice’s bare-metal-per-backend principle + the heterogeneous-SoC manifesto applied to the Intel side (GNA + AVX + 2060, CRT-shardable). 2060/AVX are the fallback only if the measured envelope won’t hold the vocoder. **Stage-0 envelope probe DONE (2026-06-09,`_xbar/GNA-ENVELOPE-PROBE.md`):** confirmed FiLM = native `Gna2OperationTypeElementWiseAffine`; Int4/Int8 weights + PWL activations + Conv1D/2D cover the vocoder primitives (zero-friction from OK_Q4B/Q8); device gen `2_0` = host + `SoftwareEmulation` path present; SW XNN/GMM kernels build clean in-sandbox (`xnn_kernel_avx2_sat` = affine+rnn). Full lib needs `libdrm-dev` on Linux (Windows uses its own driver). **Stage 1 DONE (2026-06-09, host WSL, real linked libGNA —`_xbar/GNA-ENVELOPE-PROBE.md`):** built full `libgna-api-static.a` (no-sudo `apt-get download`+`dpkg-deb -x`); `gna_probe.c` runs against it — **`Gna2ModelCreate` = SUCCESS for the FiLM `ElementWiseAffine` op (N=64…32768): the GNA 2.0 validator ACCEPTS the FiLM topology.** Measured walls: ~65535 elements/operand-dim (32768 confirmed pass, 65536 above-range), device memory cap ∈ (16 MB, 256 MB]. Operand rule learned: affine output = Int32 accumulator when PWL disabled. **Stage 2 partial (2026-06-09):** device total-memory cap PINNED by bisect — **224 MB pool OK, 256 MB →`MemoryTotalSizeExceeded`** ⇒ GNA 2.0 ceiling ≈256 MB (~224 MB usable); Conv root-caused (source-confirmed `DeviceLayerSupport.cpp`): **2D conv = GNA gen-3.0+ ONLY; GNA 2.0 has the legacy 1D conv (`INTEL_CONVOLUTIONAL`, gen-1.0+, filters `[nF,kW]`, NWD layout)** — the _right_ primitive for 1-D audio synthesis anyway. So GNA 2.0 covers the vocoder op-set (**1D-conv + FiLM/ElementWiseAffine + PWL, i8/i16**). **TWO ABSTRACTION LEVELS (the correction):** at the _raw`gna-api`_ level `INTEL_CONVOLUTIONAL_2D` is gen-3.0+ (our low-level probe routed there and rejected on 2.0); but the _OpenVINO plugin_ sits above it and **flattens 2D convs → native 1D for the GNA 2.0 target** when the kernel moves one direction. So 2D conv is _compiler-flattened_ , not absent. **Authoritative spec (OpenVINO 2023.3 GNA doc, JS-rendered):** GNA 2.0 = 10/11th-gen (ours ✓); HW-native is 1D conv; conv output-channels ×4, **max filters 65,532** ; **1D conv input-channel = 1** (channels composed across layers / folded into filters, not native multi-in-channel); **precision i8/i16, i32 accum — NOT Int4** : i8=POT performance, i16=accuracy; conv weights i16 on 2.0 (i8 conv weights only from 3.5) → our **OK_Q 4/8-bit is a _storage_ codec mapping to i8/i16 on GNA**, not native-4-bit execution; FiLM = constant-broadcast Multiply/Add (in-spec); **batch=1 for conv/LSTM** (1–8 for MatMul); SW-emulation default + HW_WITH_SW_FBACK QoS for real-time audio; GNA @400 MHz. Remaining: wire the 1D-conv (single-in-channel, i16 filters `[NF×4, kW]`, batch=1) → Conv1D→FiLM 2-op chain → Windows-host HW bring-up (NUC11 driver) on real GNA 2.0. **Production pipeline (OpenVINO refs, in receipt):** POT emits i8(perf)/i16(accuracy) = the OK_Q→GNA quantization bridge; `LowLatency2` + `HW_WITH_SW_FBACK` QoS = the real-time streaming path; **mirror a`models_contrib/speech` GNA ASR model’s IR to copy the known-good 1D-conv operand layout** (resolves the wiring blocker without blind iteration). **Named exemplar:`rm_cnn4a_smbr`** (Kaldi CNN; `mo --framework kaldi` → IR conv layer = the layout to copy). **Validation gate:`GNA_SW_EXACT`** = bit-exact SW emulation → prove the vocoder graph bit-identical in SW before HW bring-up. Conv-i16 confirmed verbatim (“conv layers always 16-bit weights, GNA HW v1/v2”). Latency anchor: wsj_dnn5b ≈4.4 ms/frame on GNA. **BRING-UP KIT STAGED (operator, 2026-06-09/10 —`archive/notes_and_stuff/GNA/`):** Stage 3 (real-silicon bring-up) is no longer driver-blocked on either OS — Windows GNA drivers (03.00.00.1815/.1910 + 03.05.00.1906/.2116) and Linux gna-drv (1.2.3/1.3.5) in hand; reference models staged: **wsj_dnn5b_smbr** (DNN — the 4.4 ms latency anchor, with .ark fixtures + HCLG decoder), **rm_lstm4f** (LSTM), **full librispeech_s5 Kaldi pipeline incl. converted OpenVINO IR** (`OV/lspeech_s5_ext.xml/.bin` — known-good GNA operand layouts to mirror for the affine path), and **aclnet** (audio CNN, fp32 + **int8 ONNX** — a quantized conv exemplar). Correction to the exemplar line above: `rm_cnn4a_smbr` itself is NOT in the kit; **aclnet int8 is the local conv-layout exemplar candidate** (its layer topology inspection = the first Stage-3 step), with the librispeech IR as the affine-layout reference and the public `models_contrib/speech` fetch as fallback if a true Kaldi-CNN IR is still wanted.\n\n**Three honest corrections to the proposal (kept on record):**\n\n 1. **It is not “minimalist / steal one insight” — it is the full modern non-AR TTS stack:** ECAPA-class speaker encoder + FastSpeech2-style frame/variance (pitch/energy) predictors + HiFi-GAN/Vocos FiLM vocoder. Every component is a _trained_ net; a GAN/diffusion vocoder is among the harder things to train stably. Scoped honestly, this is a larger program than all of XBAR to date.\n 2. **Category gap — semantic ≠ acoustic.** P2.b’s k pseudo-tokens are proven _semantic text-recall keys_ , not acoustic frames. “Exec emits an acoustic residue lane → frame predictor” hand-waves the actual hard problem (semantic → mel), which _is_ TTS. The P2.b recall win does NOT transfer to acoustic conditioning for free.\n 3. **Adopt, don’t invent — and we already have an adopted vocoder.** A real audio lane exists in the `voxtral-tts.c` satellite repo: Voxtral-4B’s **300M CONV codec decoder** (4-stage upsample + ALiBi transformer) consuming LLM latents, with our **VHT2** spectral latent compression already integrated. So this is not greenfield — it is “**replace the adopted Voxtral codec with a custom non-AR FiLM vocoder** ,” which _raises_ the bar: the custom build must beat a working baseline, not merely clone a voice. The lattice’s durable contribution to audio is the **inference substrate** — `SP_XBAR_EMB` residual injection (conditioning), fixed-point/Z_q vocoder inference (the OK_Q4B/codec work, where SP quantization earns its keep and serves the Rust/no-Python goal), and eventually CRT multi-island synthesis (manifesto #1/#9) — NOT a novel TTS architecture. If we go custom, adopt a proven non-AR design; make the _inference_ ours.\n\n\n\n**Sequencing (corrected):** this is NOT pure drift — a GNA backend is the bare-metal-per-backend principle + the heterogeneous-SoC manifesto on the Intel side, and fixed-point CNN inference exercises the Z_q substrate. Two natural tracks: (a) the **always-listening GNA input front-end** (VAD/mel/noise-suppression) is a low-risk, high-fit near-term target — GNA’s literal design purpose — and is independently useful (an always-on ear for the agent); (b) the **full synthesis stack** stays sequenced behind the live core (P2.b capacity → P3 → C2) since it’s a large trained build, but as scheduled work, not forbidden drift. The discipline is the gate order (measure GNA envelope → existence probe → co-design → fixed-point Rust), not the deferral itself.\n\n**Falsifiable FIRST step (cheap existence probe before any Rust or training, the P2.b-Phase-0 pattern):** wire **off-the-shelf pretrained** components — Vocos/HiFi-GAN vocoder + pretrained ECAPA speaker encoder + an existing non-AR acoustic model — add FiLM conditioning, and test the single load-bearing claim: _can it clone a voice from a 3 s reference at acceptable similarity AND O(1)-per-chunk latency on the 2060?_ PASS → the custom fixed-point Rust reimplementation is justified. FAIL → a day spent, not a quarter, and the architecture falsified before silicon-shaped code was built around it. **Baseline it against the existing Voxtral 300M codec** — the custom FiLM path only earns the build if it beats Voxtral on latency/footprint at equal clone quality. Gate **G-M1-CLONE (deferred):** reference-clone similarity above a pinned floor + per-chunk latency independent of sequence length, _vs the Voxtral-codec baseline_.\n\n## 4. The honest negative (stated up front)\n\n“Injected memory as sudden realization” and “confident hallucination from off-manifold state” are the **same event** described twice. The discrete substrate detects _invalid_ blocks (sentinel, lift identity); it cannot detect _semantically-wrong-but-valid_ ones. Therefore the coherence gate is load-bearing, not decorative: no promotion without a measured downstream delta, accept-or-rewind, every time. This is the kernel-gating doctrine pointed at inter-model state.\n\nSecond honest negative: RoPE phase ties keys to absolute position; SWA layers fade injected state beyond their window (the GLOBAL period-6 layers are the long-range carrier); partial-rotary-0.25 globals are the most transplant-tolerant. The probe must quantify all three, not assume them.\n\n## 5. Roadmap v1 (consolidated 2026-06-09 — rewritten on measured POC physics)\n\nStage | Lane | What | Gate / exit | Status\n---|---|---|---|---\n**P1** | XBAR-P | Inception Probe — KV transplant A/B + escalation + 5×3 matrix | CONTRACT-XBAR-P1 G0–G4 + G1b + dual-metric G2 | **CLOSED — ledger X-R1 citable** (15/15 incorporation, 15/15 selectivity, 3.69 orders, dose-response curve)\n**P2.a** | XBAR-P | Residual-entry pseudo-token mechanism probe | CONTRACT-XBAR-P2 G0E–G3E | **CLOSED** (ghost prompt ≥ KV transplant; blends fall off manifold; stall theory corrected — PLE falsified, PL=0 on the 12B)\n**P2.b** | XBAR-P/C | **The span-compression adapter** (cloud-trained, frozen 12B): n-token span → k on-manifold pseudo-tokens; inversion Phase 0 → adapter Phase 1 → on-silicon deployment gate. _The keystone: injector + Memo’s compaction organ + modality template + NIGHTSHIFT worker in one component._ | CONTRACT-XBAR-P2b G-P2b-0..4 | **SPEC’D** — next build\n**C1-lite** | XBAR-C | Memo v0 heuristic curator on the **existing qwen3 CPU two-ring** (no new infra): select/merge/evict driven by router scores + the LRU access telemetry; full propose→gate→promote/rewind loop on Ring 2′ | loop closes; ≥1 promotion improves post-window PPL/NIAH vs no-curation; rewind receipts complete | **UNBLOCKED TODAY** — can run before P3\nP3 | XBAR-P | Ring wiring on the Exec path — two-ring to the gemma4 CUDA decode loop; KV slots become Spinor-block ring entries with receipts | T_ARM gates green on gemma4-CUDA; bit-exact null path | pending\nC1-full | XBAR-C | C1-lite’s loop re-run on Exec (gemma4-CUDA ring) | same gates, Exec path | pending P3\nC2 | XBAR-C | Memo v1 = the P2.b adapter applied to ring state: **fixed ring budget, maximize Exec’s recall over the episode** (the adapter compacts; promote-on-accept gates). Open decision logged: Memo body may be _adapter + tiny ring-block encoder_ , not the 0.5B M.0 stub | recall@budget beats C1 heuristics on held-out episodes | pending P2.b + C1\n**R3** | XBAR-C | **Ring 3 consolidated tier** (§3.1) — dual-store recall (Ring 2 verbatim + Ring 3 gist); NIGHTSHIFT writes adapter pseudo-tokens to Ring 3 under the irreversible loss gate | G-R3-DUALROUTE (empty-Ring3 parity null + measured scan cost) + G-R3-LOSS (n→k loss bounded, fact-survival, pre-eviction) | pending P2.b + C1-lite\nM1 | XBAR-M | Audio lane — **input:** encoder latents → `SP_XBAR_EMB` residual injection → ring (P2.b recipe, source swap), CRT prime lane; **output (§3.2):** non-AR FiLM vocoder; **GNA 2.0 is a deliberate target** via the open libGNA (Rust FFI; archived≠dead — it’s a frozen open playground) — input front-end = GNA home turf, output vocoder = capacity-envelope question (measure via SW-emu, co-design to fit), 2060/AVX fallback; SP value = the fixed-point inference substrate | input: Exec answers questions about injected audio never seen as text · output: **G-M1-CLONE** (off-the-shelf existence probe FIRST — 3 s ref clone similarity + O(1)/chunk latency, vs Voxtral-codec baseline) | pending P2.b; GNA input front-end = near-term low-risk track; synthesis stack scheduled behind core\nN1 | XBAR-N | **NIGHTSHIFT** (§7) — episode persistence on Optane + offline Ring 2→Ring 3 consolidation under promote-on-accept, schtasks-owned | unattended run: net-positive gated promotions, zero canonical corruption, full receipt log | v0 (heuristic, Ring 2 evict) pending C1-lite; v1 (adapter, Ring 3) pending P2.b + R3\n\nOrder discipline, updated: P1’s physics is banked — **training is now licensed** , and P2.b leads because four lanes converge on it. C1-lite runs in parallel on existing infrastructure (the curator’s _control flow_ needs no training and no CUDA port). Compute split: training = cloud (RunPod/Colab A100-class, bf16 bucket weights); deployment + every gate that touches receipts = the 2060/B1 artifact via the P2.a harness.\n\n## 7. NIGHTSHIFT — the Optane subconscious (v1 design, operator synthesis 2026-06-09)\n\nC2 already proved the substrate this rests on: byte-exact Ring-2 spill/recall on physical Optane (7.57 µs/read floor), 16.3 h unattended saturation with zero leaks (the honest-MISS finale’s infrastructure half), receipts end to end. NIGHTSHIFT adds three things, none of them new physics:\n\n 1. **Episode persistence.** The Ring-2 store + router sidecar + a manifest (artifact sha, geometry, block receipts) become a named _episode_ file set on E:/F: that survives sessions. Exec’s recall router pulls from a reloaded episode exactly as it pulls from a live one — the recall machinery is already store-agnostic (RAM mock / Optane / QUIC peer, all proven).\n 2. **The consolidation pass.** Offline (idle-time, manifesto trick #7), Memo walks the episode non-causally: select/merge/evict in v0 (heuristics), span-compression via the P2.b adapter in v1 — proposals to a shadow episode, **promote-on-accept** (PPL/recall delta on probe queries), rewind on reject, every promotion receipt-logged. **The association-strength signal already exists:** the LRU temporal-locality telemetry (measured 67% hit-rates, absorption-vs-depth curves) is a live record of which blocks attention keeps returning to — “strengthen frequent associations, cull dead ones” is driven by data we already collect, not by a new estimator.\n 3. **Operational discipline inherited:** NIGHTSHIFT runs are schtasks-owned (the C2.4 lesson — bakes belong to the OS, not the agent tree); the runner banner echoes `getenv` (the config-regression lesson); no agent poll-watching.\n\n",
"title": "Shannon Prime Lattice"
}