External Publication

Shannon Prime Lattice

Hugging Face Forums [Unofficial] June 18, 2026

RFC-XBAR — The Auditable Latent Crossbar (Exec + Memo sharing Ring 2)

Status: v1.1 (Ring 3 consolidated tier added 2026-06-09, §3.1); v1 (consolidated 2026-06-09 on P1/P2.a POC data; v0 brainstorm formalized 2026-06-07, KnackAU + Gemini + Claude). v1 deltas: §5 roadmap rewritten on measured physics (P1 CLOSED ledger X-R1; P2.a CLOSED; PLE-stall theory formally corrected in CONTRACT-XBAR-P2); P2.b reframed as the span-compression adapter (CONTRACT-XBAR-P2b) — the convergence point where the injector, the curator’s compaction organ, the modality template and NIGHTSHIFT’s worker become one trained component ; C1 split into C1-lite (qwen3 CPU ring, exists today) and C1-full (post-P3); §7 NIGHTSHIFT/Optane persistence design added; §8 endgame risk register added. Parents: RFC-001 (discrete substrate doctrine) · CONTRACT-C2 (ARM Spinor-KV two-ring) · Phase 4-MeMo (M.0 stub / M.1 dual-load / M.2 zero-copy dialogue loop) · Heterogeneous-SoC manifesto (tricks #1, #4, #7, #9). One line: two models — the Executive (Exec, M_{gen}) and the Memory curator (Memo, M_{manage}) — share Ring 2 of the cyclotomic memory and communicate through latent state, not tokens , with every write receipt-backed, gated, and rewindable. The industry strings agents together through the tokenizer; we hand them the same VRAM.

0. The name

XBAR — the Auditable Latent Crossbar. “Crossbar” because it is a switching fabric between models’ latent states; “auditable” because that is the one word no floating-point agent stack can claim, and it is the entire reason this belongs in the lattice rather than in a PyTorch script. Lanes are XBAR-P (probe / physics), XBAR-C (curator), XBAR-M (modality lanes), XBAR-N (NIGHTSHIFT — idle-time consolidation).

1. The thesis

Multi-agent systems today communicate by detokenizing model A’s state into text and retokenizing it for model B. The boundary is lossy, slow, and discards everything the residual stream knew that the argmax threw away. XBAR bypasses the boundary: Exec writes its episodic state into Ring 2 as it generates; Memo — a small, differently-trained curator — reads the ring non-causally (it is offline; it sees the whole episode at once), consolidates it, and writes back. Exec’s next attention sweep picks up the curated state as native memory. No prompt, no tokens, no copy.

The same mechanism is the multimodal door: any encoder whose output is adapter-aligned to Exec’s residual geometry (we already hold the audio path — voxtral-tts.c / VHT2 / Voxtral-4B) can deposit pseudo-token state into the ring. Not one-shot prompt multimodality — persistent, curated, episodic multimodal memory.

2. Why this is ours to build (the primitives already exist)

Primitive	Status	Role in XBAR
Exec + Memo concurrent forward	PROVEN — M.1 dual-model cDSP, 1.796×, one `Arc<FastRpcSession>`; scheduler is model-agnostic	the two models already run together sharing the scheduler ; XBAR’s delta is sharing state
Memo artifact	EXISTS — M.0 stub (`qwen25-coder-0.5b-memory.sp-model`, sha-pinned)	starting body for the curator; XBAR-C retrains/replaces it
Two-ring cyclotomic memory	IN MATH-CORE — core/arm wired into `qwen3_generate_kv`, T_ARM_GENKV green	Ring 2 is the shared medium (NOTE: wired on the qwen3 CPU path; gemma4-CUDA ring wiring is XBAR-P3, not P1)
Spinor block ABI	EXISTS — 63 B + 0xA5 sentinel = 1 cache line; Frobenius-lift bit-identity as integrity receipt	the inter-model message format; a written block is provably well-formed
Transactional rewind	EXISTS — `sp_session_clone`/`rewind` (MTP work)	Memo operates on a clone; bad consolidation rolls back; canonical episode never corrupted
Accept/reject machinery	EXISTS — MTP draft→verify→byte-exact accept	repointed: gate whether an injected/consolidated memory is promoted into the canonical ring
~120× Spinor KV compression	MEASURED (C2)	the cost structure: Memo re-reads a few hundred blocks, not 32k tokens — the curator is affordable
Owned VRAM arena (no GC)	SHIPPED — engine CUDA arena, OK_Q4B kernels	the zero-copy pointer handoff is real, not aspirational
CRT residue lanes	DOCTRINE — manifesto tricks #4/#9	one prime lane per modality: audio and text blocks CRT-separable, never alias, provenance recoverable

3. Architecture

        ┌─────────────────────────── VRAM (owned arena) ───────────────────────────┐
        │                                                                          │
        │   Exec (gemma-4-12B, OK_Q4B)          Memo (small curator, frozen-small) │
        │   causal forward, generates           non-causal pass over the episode   │
        │        │            ▲                        │             ▲             │
        │        ▼ write      │ attend                 ▼ propose     │ read        │
        │   ┌─ Ring 1 ─┐  ┌── Ring 2 (hippocampus) ┐  ┌─ Ring 2′ (shadow) ─┐       │
        │   │ working  │  │ verbatim Spinor KV,    │◄─│ Memo's proposals   │       │
        │   │ KV       │  │ recent + bounded       │  │ promote-on-accept  │       │
        │   └──────────┘  └────────────────────────┘  └─────────┬──────────┘       │
        │        ▲ recall from BOTH                              │ promote (gated)  │
        │        │                ┌── Ring 3 (neocortex) ───┐◄───┘                  │
        │        └────────────────│ adapter pseudo-tokens,  │   G-R3-LOSS bounded   │
        │                         │ consolidated long-term  │   (irreversible)      │
        │                         └─────────────────────────┘                       │
        │              modality lanes (CRT prime per modality):                      │
        │              audio adapter (voxtral), video, ...                           │
        └──────────────────────────────────────────────────────────────────────────┘
   Ring 2′ promotions: coherence/PPL delta → accept or REWIND (transient, reversible).
   Ring 3 promotions: G-R3-LOSS bounded BEFORE source eviction (permanent, irreversible).

Design rules (settled in the 2026-06-07 brainstorm):

Memo is small. It sorts latents, it does not speak. A few layers / low-rank operator / 0.5B-class body co-resides permanently with Exec inside 12 GB — no weight-swap latency. (Two 12Bs on a 2060 is a non-starter; it’s also unnecessary.)
“Backwards” = non-causal. Exec is causal (past→future). Memo runs bidirectionally over the whole stored episode and rewrites it globally. That is the architectural form of consolidation (“sleep/replay”), not a vague autoencoder.
Shadow ring, promote-on-accept. Memo never writes the canonical Ring 2 directly. Proposals land in Ring 2′; a cheap downstream coherence gate (PPL delta over the post-injection window) accepts → promote with receipt, or rejects → rewind. The canonical episode stays clean and every promotion is auditable.
Geometry is the law. A ring entry is a per-layer, per-head (K,V) at a position — roped, normed, V-less where the architecture says so (gemma4 globals: V = raw K projection, weightless-RMS-normed, never roped). Nothing enters the ring that does not honor the coordinates. XBAR-P1 exists to measure how strict this law is.
One CRT prime per modality lane. Audio/text/video blocks are residue-separable in the same unified ring; Exec attends to one memory, provenance stays recoverable, lanes can never alias. (Manifesto tricks #4 + #9, applied to modality instead of hardware channel.)

3.1 The memory hierarchy — Ring 3, the consolidated tier (v1.1 amendment, 2026-06-09)

v1 carried two rings + a shadow. P2.b’s reframing (the adapter as Memo’s compaction organ) and the C2.4 finding (raw recall degrades past ~16× selection budget) together force a distinction we had been conflating: a transient staging buffer and a permanent consolidated store are different objects. Naming them separately yields a four-tier hierarchy that maps cleanly onto the brain’s memory consolidation:

Tier	Substrate	Representation	Lifetime	Biological analogue
Ring 1	RAM working window	verbatim KV, full attention	the live turn	sensory / working memory
Ring 2	Optane raw episodic store	verbatim Spinor KV blocks	recent episode (bounded)	hippocampus — recent, detailed, lossless
Ring 2′ (shadow)	transient staging copy of Ring 2/3	proposals awaiting the gate	one consolidation pass	(no analogue — it’s the audit mechanism)
Ring 3	Optane consolidated store	P2.b-adapter pseudo-tokens (n→k gist)	long-term	neocortex — old, dense, semantic

The transfer-and-transform rule (what NIGHTSHIFT actually does). Sleep does not just tidy the hippocampus; it replays raw episodes and writes compressed semantic traces to neocortex. NIGHTSHIFT does the same: it reads aging Ring 2 episodes, runs the P2.b adapter to compress n verbatim positions into k pseudo-tokens, proposes those to Ring 2′, and on gate-accept promotes them to Ring 3 (and may then evict the now-redundant raw Ring 2 positions under the same receipt). Ring 3 is therefore populated exclusively by the adapter — it is the curator’s compaction organ writing to its long-term destination.

Recall-from-both (the Executive’s new query path). Exec no longer recalls from a single growing list. Per step it queries Ring 2 for verbatim recent detail and Ring 3 for dense long-ago grounding , and attends over the union. This is why Ring 3 resolves the C2.4 ceiling rather than re-hitting it: raw Ring 2 stays bounded and recent (where the selection budget is favorable — the NIAH ladder was clean through ~8k), and the long tail lives in Ring 3 as compact gist whose effective selection budget is k-per-episode, not n-per-token. You stop asking the raw router to do 64× selection over 32k verbatim positions — the regime where it broke.

Honest negatives (operator-specified, on the board permanently):

Double-recall cost. The router must now score candidates across both Ring 2 and Ring 3 every step — two stores, two index scans, a merged top-k. The ±1 projection sidecar already supports an arbitrary candidate set, so the mechanism composes; the cost is ~2× the routing scan plus a fetch from two physical stores (the C2.2 split-device read_batch2 overlap pattern applies directly). Gate G-R3-DUALROUTE : dual-store recall reproduces single-store recall when Ring 3 is empty (parity null), and the added scan cost is measured, not assumed.
The consolidation-loss gate (irreversible). Compressing n raw tokens into k pseudo-tokens discards detail by construction — and unlike eviction (which the operator can refuse), a promoted Ring 3 block has thrown its source away. So the loss must be quantified and bounded before promotion, permanently. Gate G-R3-LOSS : for each candidate consolidation, measure the recoverable-information delta — PPL of a held-out continuation that depended on the raw span under {raw Ring 2} vs {Ring 3 gist}, plus a NIAH-style fact-survival probe on facts inside the compressed span. Promote only if the loss is within a pinned budget; otherwise the span stays verbatim in Ring 2 (some episodes are not compressible without unacceptable loss — that is a valid, logged outcome, not a failure). This gate is load-bearing and irreversible-aware : a bad Ring 3 promotion cannot be rewound the way a Ring 2′ proposal can, because the raw source is gone — so the gate runs before the source is evicted, and the eviction is part of the same receipt or does not happen.
Ring 3 is the §4 risk surface, doubled. Ring 3 blocks are adapter-generated , not model-minted — they are exactly the “semantically-wrong-but-valid” objects §4 warns about, now made permanent. The discrete substrate proves a Ring 3 block is well-formed (sentinel, lift identity); only G-R3-LOSS proves it is faithful. The coherence gate is therefore not optional on the Ring 3 path — it is the only thing standing between “consolidated memory” and “confidently fabricated history.”

Lane ownership: Ring 2 verbatim store + cold-evict = C1-lite (heuristic, today, no adapter). Ring 3 consolidation = C2 (the P2.b adapter) under G-R3-LOSS. NIGHTSHIFT = the offline loop that drives Ring 2 → (adapter) → Ring 2′ → (gate) → Ring 3. The C1-lite persistence format (episode = {K store, V store, manifest}) is the substrate both Ring 2 and Ring 3 serialize into; Ring 3 just carries pseudo-token blocks instead of verbatim KV.

BACKLOG — Ring-3 provenance tag (the “encoding gap”, banked from Ye 2606.05605, §6.1). A consolidated Ring-3 gist is the model’s own compressed memory, but Exec reads it as if it were raw context — the “encoding gap” (it does not know it is reading a memory). Solution, in the Shannon-Prime idiom — NOT a learned fp32 bias vector (that injects un-auditable continuous state, the exact failure mode the §6.2 doctrine forbids): a discrete CRT/sentinel provenance lane — a residue tag (design-rule §3-5: one CRT prime per lane) + the Spinor 0xA5 sentinel marking a block as “Ring-3 consolidated, not Ring-2 verbatim”, recoverable and receipt-backed. Gate G-R3-PROV (deferred, builds on R3): an agency-gain-style test — Exec’s held-out PPL on a continuation that depends on the gist, with the provenance tag vs without it; promote the tag only if Δppl < 0 (the model uses the provenance signal). Strictly an enhancer (Ye’s own ablation ranks proprioception secondary), so this is a post-R3 refinement, never bundled into P2.b’s first training run.

3.2 Audio (M1) — the synthesis/output path (design note + verified constraints, 2026-06-09)

§3 already holds M1’s input side: an audio encoder adapter-aligned to Exec’s residual geometry deposits pseudo-token state into the ring (the SP_XBAR_EMB injection interface, P2.a, already built). This note scopes the output side (text/latent → speech) that an operator proposal raised, and records what’s decided vs deferred. Status: design exploration, NOT a committed build. No ledger row (the ledger is for green-gate results; this is a plan).

Direction adopted: non-autoregressive vocoding with FiLM conditioning. AR token-by-token synthesis is O(N·D) in frames×codebook-depth; a parallel CNN vocoder is O(1) per chunk — the correct latency lever. FiLM (γ(s)·x + β(s)) is element-wise scale/shift, so it quantizes to fixed-point cleanly and avoids attention’s dynamic-range pain — the one part of the proposal that is both correct and SP-native.

The GNA is a deliberate target — corrected read (2026-06-09; the earlier “GNA is dead” verdict was wrong). OpenVINO dropping the GNA plugin at 2024.0 is a vendor product decision, not a statement about the silicon — conflating the two was the exact anti-pattern (treating an abandonment headline as a physical limit) we reject. Ground truth (verified): the open intel/gna library — gna-api core + XNN neural-network kernels + GMM kernels + samples, LGPL-2.1, archived read-only @ v3.0.0 but fully present — drives the GNA 2.0 on this host (11th-gen, GNA 2.0 in silicon) directly via Rust FFI, AND ships a CPU software-execution mode for driver-free graph prototyping. For a build-it-ourselves fixed-point pipeline, vendor abandonment is a feature : a frozen, open, no-lock-in, uncontested target. GNA was designed for “always-on AI speech/audio (neural noise cancellation)” — so the input side (VAD / mel / noise-suppression / tiny feature CNN) is its home turf and the natural first target. The output side (a fixed-point FiLM CNN vocoder on GNA) is an open capacity-envelope question, not a no : measure GNA 2.0’s XNN op set, INT8/16 precision, memory and throughput (cheaply — SW-emulation mode + samples first), then co-design a minimal vocoder to fit the envelope rather than porting a big one or assuming it can’t hold one. A GNA backend is the lattice’s bare-metal-per-backend principle + the heterogeneous-SoC manifesto applied to the Intel side (GNA + AVX + 2060, CRT-shardable). 2060/AVX are the fallback only if the measured envelope won’t hold the vocoder. Stage-0 envelope probe DONE (2026-06-09,_xbar/GNA-ENVELOPE-PROBE.md): confirmed FiLM = native Gna2OperationTypeElementWiseAffine; Int4/Int8 weights + PWL activations + Conv1D/2D cover the vocoder primitives (zero-friction from OK_Q4B/Q8); device gen 2_0 = host + SoftwareEmulation path present; SW XNN/GMM kernels build clean in-sandbox (xnn_kernel_avx2_sat = affine+rnn). Full lib needs libdrm-dev on Linux (Windows uses its own driver). Stage 1 DONE (2026-06-09, host WSL, real linked libGNA —_xbar/GNA-ENVELOPE-PROBE.md): built full libgna-api-static.a (no-sudo apt-get download+dpkg-deb -x); gna_probe.c runs against it — Gna2ModelCreate = SUCCESS for the FiLM ElementWiseAffine op (N=64…32768): the GNA 2.0 validator ACCEPTS the FiLM topology. Measured walls: 65535 elements/operand-dim (32768 confirmed pass, 65536 above-range), device memory cap ∈ (16 MB, 256 MB]. Operand rule learned: affine output = Int32 accumulator when PWL disabled. Stage 2 partial (2026-06-09): device total-memory cap PINNED by bisect — 224 MB pool OK, 256 MB →MemoryTotalSizeExceeded ⇒ GNA 2.0 ceiling ≈256 MB (224 MB usable); Conv root-caused (source-confirmed DeviceLayerSupport.cpp): 2D conv = GNA gen-3.0+ ONLY; GNA 2.0 has the legacy 1D conv (INTEL_CONVOLUTIONAL, gen-1.0+, filters [nF,kW], NWD layout) — the right primitive for 1-D audio synthesis anyway. So GNA 2.0 covers the vocoder op-set (1D-conv + FiLM/ElementWiseAffine + PWL, i8/i16). TWO ABSTRACTION LEVELS (the correction): at the rawgna-api level INTEL_CONVOLUTIONAL_2D is gen-3.0+ (our low-level probe routed there and rejected on 2.0); but the OpenVINO plugin sits above it and flattens 2D convs → native 1D for the GNA 2.0 target when the kernel moves one direction. So 2D conv is compiler-flattened , not absent. Authoritative spec (OpenVINO 2023.3 GNA doc, JS-rendered): GNA 2.0 = 10/11th-gen (ours ✓); HW-native is 1D conv; conv output-channels ×4, max filters 65,532 ; 1D conv input-channel = 1 (channels composed across layers / folded into filters, not native multi-in-channel); precision i8/i16, i32 accum — NOT Int4 : i8=POT performance, i16=accuracy; conv weights i16 on 2.0 (i8 conv weights only from 3.5) → our OK_Q 4/8-bit is a storage codec mapping to i8/i16 on GNA, not native-4-bit execution; FiLM = constant-broadcast Multiply/Add (in-spec); batch=1 for conv/LSTM (1–8 for MatMul); SW-emulation default + HW_WITH_SW_FBACK QoS for real-time audio; GNA @400 MHz. Remaining: wire the 1D-conv (single-in-channel, i16 filters [NF×4, kW], batch=1) → Conv1D→FiLM 2-op chain → Windows-host HW bring-up (NUC11 driver) on real GNA 2.0. Production pipeline (OpenVINO refs, in receipt): POT emits i8(perf)/i16(accuracy) = the OK_Q→GNA quantization bridge; LowLatency2 + HW_WITH_SW_FBACK QoS = the real-time streaming path; mirror amodels_contrib/speech GNA ASR model’s IR to copy the known-good 1D-conv operand layout (resolves the wiring blocker without blind iteration). Named exemplar:rm_cnn4a_smbr (Kaldi CNN; mo --framework kaldi → IR conv layer = the layout to copy). Validation gate:GNA_SW_EXACT = bit-exact SW emulation → prove the vocoder graph bit-identical in SW before HW bring-up. Conv-i16 confirmed verbatim (“conv layers always 16-bit weights, GNA HW v1/v2”). Latency anchor: wsj_dnn5b ≈4.4 ms/frame on GNA. BRING-UP KIT STAGED (operator, 2026-06-09/10 —archive/notes_and_stuff/GNA/): Stage 3 (real-silicon bring-up) is no longer driver-blocked on either OS — Windows GNA drivers (03.00.00.1815/.1910 + 03.05.00.1906/.2116) and Linux gna-drv (1.2.3/1.3.5) in hand; reference models staged: wsj_dnn5b_smbr (DNN — the 4.4 ms latency anchor, with .ark fixtures + HCLG decoder), rm_lstm4f (LSTM), full librispeech_s5 Kaldi pipeline incl. converted OpenVINO IR (OV/lspeech_s5_ext.xml/.bin — known-good GNA operand layouts to mirror for the affine path), and aclnet (audio CNN, fp32 + int8 ONNX — a quantized conv exemplar). Correction to the exemplar line above: rm_cnn4a_smbr itself is NOT in the kit; aclnet int8 is the local conv-layout exemplar candidate (its layer topology inspection = the first Stage-3 step), with the librispeech IR as the affine-layout reference and the public models_contrib/speech fetch as fallback if a true Kaldi-CNN IR is still wanted.

Three honest corrections to the proposal (kept on record):

It is not “minimalist / steal one insight” — it is the full modern non-AR TTS stack: ECAPA-class speaker encoder + FastSpeech2-style frame/variance (pitch/energy) predictors + HiFi-GAN/Vocos FiLM vocoder. Every component is a trained net; a GAN/diffusion vocoder is among the harder things to train stably. Scoped honestly, this is a larger program than all of XBAR to date.
Category gap — semantic ≠ acoustic. P2.b’s k pseudo-tokens are proven semantic text-recall keys , not acoustic frames. “Exec emits an acoustic residue lane → frame predictor” hand-waves the actual hard problem (semantic → mel), which is TTS. The P2.b recall win does NOT transfer to acoustic conditioning for free.
Adopt, don’t invent — and we already have an adopted vocoder. A real audio lane exists in the voxtral-tts.c satellite repo: Voxtral-4B’s 300M CONV codec decoder (4-stage upsample + ALiBi transformer) consuming LLM latents, with our VHT2 spectral latent compression already integrated. So this is not greenfield — it is “replace the adopted Voxtral codec with a custom non-AR FiLM vocoder ,” which raises the bar: the custom build must beat a working baseline, not merely clone a voice. The lattice’s durable contribution to audio is the inference substrate — SP_XBAR_EMB residual injection (conditioning), fixed-point/Z_q vocoder inference (the OK_Q4B/codec work, where SP quantization earns its keep and serves the Rust/no-Python goal), and eventually CRT multi-island synthesis (manifesto #1/#9) — NOT a novel TTS architecture. If we go custom, adopt a proven non-AR design; make the inference ours.

Sequencing (corrected): this is NOT pure drift — a GNA backend is the bare-metal-per-backend principle + the heterogeneous-SoC manifesto on the Intel side, and fixed-point CNN inference exercises the Z_q substrate. Two natural tracks: (a) the always-listening GNA input front-end (VAD/mel/noise-suppression) is a low-risk, high-fit near-term target — GNA’s literal design purpose — and is independently useful (an always-on ear for the agent); (b) the full synthesis stack stays sequenced behind the live core (P2.b capacity → P3 → C2) since it’s a large trained build, but as scheduled work, not forbidden drift. The discipline is the gate order (measure GNA envelope → existence probe → co-design → fixed-point Rust), not the deferral itself.

Falsifiable FIRST step (cheap existence probe before any Rust or training, the P2.b-Phase-0 pattern): wire off-the-shelf pretrained components — Vocos/HiFi-GAN vocoder + pretrained ECAPA speaker encoder + an existing non-AR acoustic model — add FiLM conditioning, and test the single load-bearing claim: can it clone a voice from a 3 s reference at acceptable similarity AND O(1)-per-chunk latency on the 2060? PASS → the custom fixed-point Rust reimplementation is justified. FAIL → a day spent, not a quarter, and the architecture falsified before silicon-shaped code was built around it. Baseline it against the existing Voxtral 300M codec — the custom FiLM path only earns the build if it beats Voxtral on latency/footprint at equal clone quality. Gate G-M1-CLONE (deferred): reference-clone similarity above a pinned floor + per-chunk latency independent of sequence length, vs the Voxtral-codec baseline.

4. The honest negative (stated up front)

“Injected memory as sudden realization” and “confident hallucination from off-manifold state” are the same event described twice. The discrete substrate detects invalid blocks (sentinel, lift identity); it cannot detect semantically-wrong-but-valid ones. Therefore the coherence gate is load-bearing, not decorative: no promotion without a measured downstream delta, accept-or-rewind, every time. This is the kernel-gating doctrine pointed at inter-model state.

Second honest negative: RoPE phase ties keys to absolute position; SWA layers fade injected state beyond their window (the GLOBAL period-6 layers are the long-range carrier); partial-rotary-0.25 globals are the most transplant-tolerant. The probe must quantify all three, not assume them.

5. Roadmap v1 (consolidated 2026-06-09 — rewritten on measured POC physics)

Stage	Lane	What	Gate / exit	Status
P1	XBAR-P	Inception Probe — KV transplant A/B + escalation + 5×3 matrix	CONTRACT-XBAR-P1 G0–G4 + G1b + dual-metric G2	CLOSED — ledger X-R1 citable (15/15 incorporation, 15/15 selectivity, 3.69 orders, dose-response curve)
P2.a	XBAR-P	Residual-entry pseudo-token mechanism probe	CONTRACT-XBAR-P2 G0E–G3E	CLOSED (ghost prompt ≥ KV transplant; blends fall off manifold; stall theory corrected — PLE falsified, PL=0 on the 12B)
P2.b	XBAR-P/C	The span-compression adapter (cloud-trained, frozen 12B): n-token span → k on-manifold pseudo-tokens; inversion Phase 0 → adapter Phase 1 → on-silicon deployment gate. The keystone: injector + Memo’s compaction organ + modality template + NIGHTSHIFT worker in one component.	CONTRACT-XBAR-P2b G-P2b-0..4	SPEC’D — next build
C1-lite	XBAR-C	Memo v0 heuristic curator on the existing qwen3 CPU two-ring (no new infra): select/merge/evict driven by router scores + the LRU access telemetry; full propose→gate→promote/rewind loop on Ring 2′	loop closes; ≥1 promotion improves post-window PPL/NIAH vs no-curation; rewind receipts complete	UNBLOCKED TODAY — can run before P3
P3	XBAR-P	Ring wiring on the Exec path — two-ring to the gemma4 CUDA decode loop; KV slots become Spinor-block ring entries with receipts	T_ARM gates green on gemma4-CUDA; bit-exact null path	pending
C1-full	XBAR-C	C1-lite’s loop re-run on Exec (gemma4-CUDA ring)	same gates, Exec path	pending P3
C2	XBAR-C	Memo v1 = the P2.b adapter applied to ring state: fixed ring budget, maximize Exec’s recall over the episode (the adapter compacts; promote-on-accept gates). Open decision logged: Memo body may be adapter + tiny ring-block encoder , not the 0.5B M.0 stub	recall@budget beats C1 heuristics on held-out episodes	pending P2.b + C1
R3	XBAR-C	Ring 3 consolidated tier (§3.1) — dual-store recall (Ring 2 verbatim + Ring 3 gist); NIGHTSHIFT writes adapter pseudo-tokens to Ring 3 under the irreversible loss gate	G-R3-DUALROUTE (empty-Ring3 parity null + measured scan cost) + G-R3-LOSS (n→k loss bounded, fact-survival, pre-eviction)	pending P2.b + C1-lite
M1	XBAR-M	Audio lane — input: encoder latents → `SP_XBAR_EMB` residual injection → ring (P2.b recipe, source swap), CRT prime lane; output (§3.2): non-AR FiLM vocoder; GNA 2.0 is a deliberate target via the open libGNA (Rust FFI; archived≠dead — it’s a frozen open playground) — input front-end = GNA home turf, output vocoder = capacity-envelope question (measure via SW-emu, co-design to fit), 2060/AVX fallback; SP value = the fixed-point inference substrate	input: Exec answers questions about injected audio never seen as text · output: G-M1-CLONE (off-the-shelf existence probe FIRST — 3 s ref clone similarity + O(1)/chunk latency, vs Voxtral-codec baseline)	pending P2.b; GNA input front-end = near-term low-risk track; synthesis stack scheduled behind core
N1	XBAR-N	NIGHTSHIFT (§7) — episode persistence on Optane + offline Ring 2→Ring 3 consolidation under promote-on-accept, schtasks-owned	unattended run: net-positive gated promotions, zero canonical corruption, full receipt log	v0 (heuristic, Ring 2 evict) pending C1-lite; v1 (adapter, Ring 3) pending P2.b + R3

Order discipline, updated: P1’s physics is banked — training is now licensed , and P2.b leads because four lanes converge on it. C1-lite runs in parallel on existing infrastructure (the curator’s control flow needs no training and no CUDA port). Compute split: training = cloud (RunPod/Colab A100-class, bf16 bucket weights); deployment + every gate that touches receipts = the 2060/B1 artifact via the P2.a harness.

7. NIGHTSHIFT — the Optane subconscious (v1 design, operator synthesis 2026-06-09)

C2 already proved the substrate this rests on: byte-exact Ring-2 spill/recall on physical Optane (7.57 µs/read floor), 16.3 h unattended saturation with zero leaks (the honest-MISS finale’s infrastructure half), receipts end to end. NIGHTSHIFT adds three things, none of them new physics:

Episode persistence. The Ring-2 store + router sidecar + a manifest (artifact sha, geometry, block receipts) become a named episode file set on E:/F: that survives sessions. Exec’s recall router pulls from a reloaded episode exactly as it pulls from a live one — the recall machinery is already store-agnostic (RAM mock / Optane / QUIC peer, all proven).
The consolidation pass. Offline (idle-time, manifesto trick #7), Memo walks the episode non-causally: select/merge/evict in v0 (heuristics), span-compression via the P2.b adapter in v1 — proposals to a shadow episode, promote-on-accept (PPL/recall delta on probe queries), rewind on reject, every promotion receipt-logged. The association-strength signal already exists: the LRU temporal-locality telemetry (measured 67% hit-rates, absorption-vs-depth curves) is a live record of which blocks attention keeps returning to — “strengthen frequent associations, cull dead ones” is driven by data we already collect, not by a new estimator.
Operational discipline inherited: NIGHTSHIFT runs are schtasks-owned (the C2.4 lesson — bakes belong to the OS, not the agent tree); the runner banner echoes getenv (the config-regression lesson); no agent poll-watching.