{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreia662qzzivgv7qqmeqiyn4aofirhugfxuhj7w3646jet54xxtshgy",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mp52ynffcbk2"
},
"path": "/t/shannon-prime-lattice/176466?page=2#post_37",
"publishedAt": "2026-06-25T11:21:51.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"@0.6",
"@skill"
],
"textContent": "**type** | foundation\n---|---\n**title** | Shannon-Prime — KEYSTONE: the complete system, as built\n**description** | The canonical, current, complete description of Shannon-Prime at the KEYSTONE milestone (keystone-1, 2026-06-25): the byte-exact O_K engine + the two-ring/XBAR memory + the autonomous memory agency + the tool-calling harness + the conversation-memory tiers, integrated into one self-supporting organism. The Rosetta stone: read this first, follow the links only as you need them.\n**tags** |\n\nkeystone | foundation | architecture | memory | harness | agency | navigation | okf\n---|---|---|---|---|---|---|---\n\n**timestamp** | 2026-06-25 00:00:00 UTC\n---|---\n**resource** | shannon-prime-lattice\n**sp_status** | GREEN-LIVE\n**sp_gate** | KEYSTONE-1\n**sp_commit** | keystone-1\n**sp_repro** | see §10 (Run it) + §9 (Gate index)\n\n# **Shannon-Prime — KEYSTONE**\n\n> **Read order for an agent or human:** this file is the map. Each section is self-contained. Pull a subsystem’s detail only when you need it (the §11 navigation table says where it lives). Nothing here needs the whole tree in context — that is the point.\n\n## **0. What Shannon-Prime is (90 seconds)**\n\nShannon-Prime is a **fully local, byte-exact, auditable language-model organism**. It serves Google’s **Gemma-4-12B** (OK_Q4B quant) on a single RTX 2060, through **our own** inference engine, on an **exact-integer arithmetic substrate** (`O_K = Z[(1+√-163)/2]`, dual-prime negacyclic CRT-NTT), with a **working memory it owns** : it learns facts from conversation, recalls them, forgets and supersedes and merges them on its own judgement, stores whole conversations both complete and summarized, calls tools and runs code, and — between turns, on a heartbeat — consolidates the live conversation and tidies its memory. Every mechanism is a flag that is a **strict no-op when unset** (the “null floor”); every number has a reproducing command and a gate. No cloud, no third-party inference, no telemetry.\n\n**The thesis** (public name: _Position Is Arithmetic_): an LLM’s container can be made **exact arithmetic** (cross-machine-deterministic, auditable) without losing quality, and memory can be **content/position-addressed** rather than token-shaped. Structure-on-content compression is a **measured negative** (kept as honest negatives); the win is the _container_.\n\n## **1. The KEYSTONE milestone (keystone-1)**\n\nKEYSTONE is the night (2026-06-25) the arches locked together. Before it, the pieces were proven in isolation (byte-exact forward, the two-ring memory, the learned librarian, the diffusion judge). KEYSTONE is the **integration** : the served chat now\n\n * holds the **conversation thread faithfully** (system-prompt priming fixed parametric drift),\n * **learns** facts as you state them, **recalls** them, and **forgets / supersedes / merges** them on the model’s own verdict (LAYER-2 forget, LAYER-3 decide+merge),\n * **calls tools** and **runs Python** through the re-hosted harness (ephemeral text-protocol),\n * **manages its own memory** in an autonomous agency round on a **heartbeat** (KAIROS tick),\n * stores conversations in **tiers** — live (short) → extracted facts (mid) → full+summary MEM-OKF (long) — gist by default, dig deeper on demand,\n * knows **what it is and how to use itself** (system prompt + a recallable capabilities corpus),\n\n\n\n…and the loop closes with **zero manual steps** : the daemon writes each turn’s conversation to disk, the agency scheduler consolidates it on its tick.\n\nThis document is the foundation we build forward on. Older roadmaps/RFCs are archived (§11); this supersedes their “current state” sections.\n\n## **2. The five repositories**\n\n**Repo** | **Role** | **Lang** | **Canonical entry**\n---|---|---|---\n**shannon-prime-lattice** | umbrella: papers, contracts, RFC, roadmap, OKFS/MEM-OKF, this doc | md/py | `prompt.md`, `papers/`\n**shannon-prime-system** | the math core (no engine deps): O_K, NTT-CRT, exact islands, ARM two-ring, L1 ABI | C | `include/sp/sp_l1.h`, `core/`\n**shannon-prime-system-engine** | the inference engine + backends + the resident daemon + memory agency | C/CUDA/Rust | `tools/sp_daemon/`, `src/backends/cuda/`\n**shannon-prime-harness** | the agent harness: tool calling, conversation memory, the agency loop (CosySim runtime re-hosted on sp-daemon) | Python | `harness/`, `run_agency.py`\n**Position_Is_Arithmetic** | the public face: receipts-first papers + LEDGER | md | `README.md`, `SERIES.md`, `papers/`\n\n`shannon-prime-system` is also vendored into the engine as the `lib/shannon-prime-system` submodule — `git fetch` + check behind before building (the two can diverge).\n\n## **3. Architecture (the whole stack)**\n\n\n ┌──────────────────────────────────────────────┐\n USER (browser console) │ Position Is Arithmetic — papers / LEDGER │ public face\n │ index.html └──────────────────────────────────────────────┘\n │ POST /v1/chat (messages, knobs) ▲ receipts\n ▼ │\n ┌─────────────────────────────────────────────────────────────────────────────┐\n │ sp-daemon (Rust, shannon-prime-system-engine/tools/sp_daemon) │\n │ ─────────────────────────────────────────────────────────────────────── │\n │ /v1/chat → template → prefill → DECODE → SSE {delta} │\n │ ├─ EOT bias (clean stop) ├─ auto_recall: W_c head → judge → │\n │ │ │ text-in-context recite / reject │\n │ ├─ LAYER-2 FORGET (SP_FORGET) ├─ NIGHTSHIFT capture (statements→reg) │\n │ ├─ LAYER-3 DECIDE+MERGE (SP_DECIDE) │\n │ └─ writes the turn → SP_CURRENT_CONVO (the consolidation hook) │\n │ registers L1 backends: forward (prefill) + kvdecode (token-by-token) │\n └───────────────┬───────────────────────────────────────────────┬─────────────┘\n │ L1 ABI (sp_l1.h) │ POST /v1/chat\n ▼ ▼\n ┌───────────────────────────────────┐ ┌──────────────────────────────────┐\n │ ENGINE backends (CUDA/CPU/…) │ │ HARNESS (Python) │\n │ gemma4 forward + decode │ │ SPDaemonClient ─ to_sp_chat │\n │ OK_Q4B GEMV (dp4a) │ │ run_with_tools <tool …> ReAct │\n │ SP_BYTEEXACT exact-int islands │ │ memory tools: list/remember/forget│\n │ diffusiongemma-26B judge (dg_*) │ │ conversation_memory: tiers + caps │\n └───────────────┬───────────────────┘ │ agency: round + scheduler (KAIROS)│\n │ consumes └──────────────┬───────────────────┘\n ▼ │ reads SP_CURRENT_CONVO\n ┌───────────────────────────────────┐ │ writes registry + MEM-OKF\n │ MATH CORE (shannon-prime-system) │ ▼\n │ O_K = Z[(1+√-163)/2] │ ┌──────────────────────────────────┐\n │ dual-prime NTT-CRT (q1,q2≈2^60) │ │ MEMORY │\n │ exact_islands (RMS/softmax/GELU/ │ │ registry.jsonl (facts, mid/long) │\n │ RoPE, CORDIC, no libm) │ │ _nightshift_live/ (episode ep.k) │\n │ ARM two-ring KV · Frobenius lift│ │ memory-okf*/ (LUT→sum→full, sha) │\n │ L1 ABI (forward + kvdecode verbs) │ │ _current_conversation.json (short)│\n └───────────────────────────────────┘ └──────────────────────────────────┘\n\n\n\n## **4. The subsystems (what / where / how it integrates)**\n\n * **O_K substrate + byte-exact** (`system/core/ntt_crt`, `core/poly_ring`, `core/exact_islands`; engine `SP_BYTEEXACT`). Exact-integer arithmetic on `O_K`, dual-prime negacyclic CRT-NTT (primes q1=1073738753, q2=1073732609, M≈2^60 fits u64 → no __int128). The 4 nonlinear islands (RMSNorm/softmax/GELU/RoPE) have exact-integer references (RoPE via fixed-point CORDIC, no libm). _Byte-exact = exact arithmetic / cross-machine determinism / AUDITABILITY — NOT compression._ Gate G-BYTEEXACT-FORWARD-12B (off=4.6665 byte-identical null floor / on=parity, run-to-run bit-identical). Detail: `papers/CONTRACT-BYTEEXACT-forward.md`.\n\n * **The engine + daemon** (`engine/src/backends/`, `engine/tools/sp_daemon/`). gemma4 CUDA forward + token-by-token decode (per-layer SWA/global, shared-KV, AltUp/PL=0, softcap, OK_Q4B dp4a GEMV, CUDA-graph decode). The **universal resident daemon** drives the 12B end-to-end via the L1 ABI: prefill (`sp_session_register_forward_backend`) + DECODE (`sp_session_register_ kvdecode_backend`, the §6b persistent-KV verb). VRAM flat O(1). Detail: `CONTRACT-CHAT-FULLSTACK`.\n\n * **ARM — two-ring KV memory** (`system/core/arm/`). ±1 Rademacher recall router, Ring-1 slot map, Ring-2 episode store, recall-hit telemetry, cold-evict. The substrate the episodic memory rides.\n\n * **XBAR — the auditable latent crossbar** (lattice `papers/CONTRACT-XBAR-*`, engine `SP_XBAR_*`, `tools/ring3/`). C2 256-bit content signatures, native integer Ring-3 VSA bind on `sp_pr_mul`, Frobenius π^k integer episode store. Boundary thesis lives here: O_K wins on the _container_ ; structure-on-content levers are measured-inert (honest negatives kept).\n\n * **The memory agency** (engine `tools/sp_daemon/src/routes.rs`). The model owns its memory:\n\n * **STORE** — NIGHTSHIFT captures statements (loose admission: skip questions/requests/forget-turns).\n * **FORGET** (`SP_FORGET`) — “forget X” → token-overlap match → drop from live set + rewrite registry.\n * **DECIDE** (`SP_DECIDE`) — on a capturing turn that overlaps an existing memory, a side model-call asks the model itself: supersede (`CHANGED=n`, the “cannot both be true at once” test) or consolidate (`MERGE:: combined`, drop both + capture the synthesis). Default-off = null floor. Gates: G-FORGET, G-DECIDE, G-MERGE. Detail: memory `project_memory_agency_forget`.\n * **NIGHTSHIFT — the offline curator** (lattice `CONTRACT-NIGHTSHIFT-CURATOR`, engine `run_kairos_curator`). Live capture → (optional) teacher-forced causal-ablation admission (TAU=-8: load-bearing facts collapse, parametric ones don’t) → conformant MEM-OKF emit.\n\n * **The learned librarian (W_c)** (engine `recall.rs`, `SP_B3_WC`). A learned head does autonomous instance-level episodic recall (logsumexp-over-positions, mean-over-heads; (E+1)-way NULL argmax; bounded-mass replay). The boundary-thesis win: recall is a _learned head on a diverse corpus_ , not a hand-designed signal. Paper 24 (the learned librarian).\n\n * **The diffusion judge** (engine `cuda_forward.cu` `dg_*`, diffusiongemma-26B-A4B MoE). A native iterative-denoise recall/reject judge; perf levers SP_DG_SCRATCHREUSE (default-on ~1.46x), SP_DG_ASYNC (byte-exact ~2x), prefix-KV (~1.6x, answer-lossless). NOTE: the _production_ recall gate is the **deterministic token-overlap (Jaccard) verifier** @0.6, not the 26B (83%/95% on a CPU string op; the 26B cascade was retired). Detail: memory `project_judge_deterministic_gate`.\n\n * **KAIROS — the heartbeat / agency tick** (engine `kairos.rs` stub control plane; the model-driven realization is harness `agency.py`). The “auto rounds” where the organism _does things_ between turns instead of only stopping.\n\n * **The harness** (`shannon-prime-harness/`). CosySim’s agent runtime re-hosted on sp-daemon (lmstudio stripped). The inference seam is `InferenceConfig.to_sp_chat()` → `SPDaemonClient` (`POST /v1/chat`, SSE). **Ephemeral tool calling** : the model emits `<tool name=\"…\">{json}</tool>` in plain text, `run_with_tools` parses + executes + feeds back (ReAct loop, no native tool channel needed). `ToolSpec.from_callable` derives the schema from a Python signature; `@skill` decorators bridge to tools. Memory tools (`skills/memory.py`) + conversation memory + the agency loop.\n\n * **MEM-OKF — content-addressed tiered memory** (`tools/okf_mem.py`; the SP-OKF knowledge format). Every object sha256-addressed; three disclosure tiers: **LUT** (index) → **sum/** (gist) → **full/** (complete). The conversation tier and the capabilities corpus both ride it. Anti-rebuild pre-flight is binding: `okf_mem lookup` before building anything. Spec: `papers/MEMORY-OKF-PROFILE.md`.\n\n\n\n\n## **5. The memory model (the heart of KEYSTONE)**\n\nThree tiers, one signature scheme (sha256 / C2-sig) linking them so the model can get the gist and dig deeper only when needed:\n\n**Tier** | **What** | **Where** | **How it fills**\n---|---|---|---\n**SHORT** | the live conversation | prefilled `messages` each turn; `_current_conversation.json` | the daemon carries full history (re-prefill); a system prompt makes the model _faithful_ to it\n**MID** | durable facts | `registry.jsonl` (+ `_nightshift_live/ep.k`) | NIGHTSHIFT live capture of statements; harness `consolidate_conversation` extraction; `remember()` (idempotent)\n**LONG** | whole conversations + capabilities | `memory-okf-conv/` (full+summary), `memory-okf-caps/` | `store_conversation` (sha-linked full/sum); `seed_capabilities`\n\n**Agency over the tiers:** the model forgets / supersedes / merges facts (LAYER-2/3); the agency scheduler consolidates the live conversation and tidies memory on its heartbeat. **Recall:** `recall_conversations(query)` → the gist; `read_conversation(addr)` → the full transcript.\n\n**Seeding & priming.** On init the model is primed about _itself_ : (a) a default **system prompt** (served console `index.html`) states identity + capabilities + the faithfulness rule (“use what the user said; never substitute a stated fact”); (b) a **capabilities corpus** of recallable self-knowledge facts seeded into the served registry (`_seed_capabilities.py`); (c) optional **diverse non-parametric seed facts** (`_seed_mint.py`) that bootstrap recall without priming performance. Principle: seed facts the model _can’t_ parametrically know (self / hardware / operator), so recall is clean proof and any self-model is genuine.\n\n## **6. A turn, end to end (the data flow)**\n\n 1. Console accumulates `history` (system + user + assistant), POSTs `messages` + knobs to `/v1/chat`.\n 2. Daemon templates the **full** conversation (gemma4 control tokens 105/106/107), prefills, and — if `SP_CURRENT_CONVO` is set — **writes the conversation to disk** (the consolidation hook).\n 3. If `auto_recall`: the W_c head / judge scores stored episodes; on a confident match it recites via **text-in-context** ; otherwise it abstains (token-overlap verifier @0.6 guards false fires).\n 4. Decode streams tokens (SSE `{delta}`), EOT-biased so it stops cleanly.\n 5. Post-response: NIGHTSHIFT **captures** the user statement (if admitted); LAYER-3 **DECIDE** may supersede/merge a related memory.\n 6. Out of band, on the **KAIROS tick** (harness `run_agency_scheduler`, idle-gated): **consolidate** the written conversation (facts → mid, transcript → long) then a **maintenance round** where the model curates its own memory. Zero manual steps.\n\n\n\n## **7. The knobs (env flags + GUI)**\n\nAll `SP_*` flags are **default-off = byte-identical null floor**. The GUI knobs live in the served console (`index.html`, left pane “sampler · knobs”) and flow into the `/v1/chat` body.\n\n**Knob** | **Where** | **Effect**\n---|---|---\n`SP_BYTEEXACT` | engine env | exact-integer islands + attention (auditable decode)\n`SP_EOT_BIAS` / `eot` (GUI) | daemon | logit bias on stop tokens so the model ends cleanly (≈4)\n`SP_AUTO_RECALL_DEFAULT` / `auto-recall` (GUI) | daemon | autonomous episodic recall on\n`SP_FORGET` | daemon | LAYER-2 forget primitive\n`SP_DECIDE` | daemon | LAYER-3 supersede + merge\n`SP_B4_NIGHTSHIFT` / `SP_NIGHTSHIFT_PERSIST` | daemon | live capture / persist facts across restart\n`SP_CURRENT_CONVO` | daemon | write the turn’s conversation for the consolidator\n`SP_RECALL_REGISTRY` | daemon + harness | the shared mid/long fact store path\n`SP_CONV_OKF_ROOT` / `SP_CAPS_OKF_ROOT` | harness | the conversation / capabilities MEM-OKF roots\n`SP_AGENCY_INTERVAL` / `SP_CURRENT_CONVO` | harness scheduler | tick cadence / conversation to consolidate\n`temperature/top_p/top_k/rep/max` (GUI) | sampler | standard decode controls (temp 0 = byte-exact-friendly argmax)\n\n## **8. The API surface**\n\n**Daemon (`POST`/`GET` on :3000):** `/v1/chat` (messages|prompt|prompt_tokens + knobs → SSE `{delta}` ending `[DONE]`), `/v1/abort/{id}`, `/v1/capture` (mint an episode), `/v1/metrics`, `/v1/mesh/peers`, `/v1/debug/backend_counts`. L1 ABI (`sp_l1.h`): `sp_session_register_forward_ backend`, `sp_session_register_kvdecode_backend` (§6b persistent-KV decode).\n\n**Harness (Python):** `SPDaemonClient.chat / chat_stream`; `InferenceConfig.to_sp_chat`; `run_with_tools(messages, tools)` + `ToolSpec.from_callable`; `skills.memory.{list_memories, remember,forget}`; `skills.conversation_memory.{summarize_conversation,store_conversation, recall_conversations,read_conversation,extract_facts,consolidate_conversation,seed_capabilities, init_primer}`; `control.agency.{agency_round,run_agency_scheduler,consolidate_current}`.\n\nFull reference: `papers/PPT-LAT-KEYSTONE-API.md`.\n\n## **9. Gate / receipt index (the proof map)**\n\nMemory agency: G-FORGET, G-DECIDE, G-MERGE (engine `tests/fixtures/chat_fullstack/`). Harness: G-HARNESS-DAEMON-E2E (H1), G-HARNESS-TOOLCALL-E2E (H2), G-HARNESS-MEMTOOLS-E2E (H3), G-HARNESS-AGENCY-E2E (H4), G-HARNESS-KAIROS-TICK (H5), G-HARNESS-CONVMEM (H6), G-HARNESS-LIVE + G-HARNESS-HOOK-E2E (H7) — all in `shannon-prime-harness/tests/`. Byte-exact: G-BYTEEXACT-FORWARD-12B. Recall: G-CHAT-B3-WC-DEPLOY. Judge: G-JUDGE-BATTERY. Each receipt has a `python tests/<gate>.py` (or the contract’s repro). Rule: **no number without a command + a row.**\n\n## **10. Run it (live, from clean)**\n\n 1. Daemon: `_e2e_seed_serve.bat` (port 3000; sets EOT bias, auto-recall, forget, decide, nightshift, persist, current-convo, the seed registry).\n 2. Seed capabilities (once): `python tools/xbar_lsh/_seed_capabilities.py` then restart the daemon.\n 3. Agency + consolidation: `run_agency.bat` (the harness scheduler, alongside the daemon).\n 4. Chat: `http://127.0.0.1:3000/` (hard-refresh; the knobs are on the left). Build: CUDA = VS2019 BuildTools + CUDA, `build-cuda/`, ninja (sm_75 on the 2060); daemon = `cargo build --release --features wire_cuda_backend`. Git on these repos: **native PowerShell, not the Linux mount** (the mount CRLF-churns + locks).\n\n\n\n## **11. Navigation — where to look for what**\n\n**Need** | **Go to**\n---|---\nBootstrap / methodology / operator | lattice `prompt.md`, `CLAUDE.md`\nProven state record | lattice `papers/PPT-LAT-STATE.md`\nThis map | lattice `papers/PPT-LAT-KEYSTONE.md` (here)\nAPI reference | lattice `papers/PPT-LAT-KEYSTONE-API.md`\nMemory agency detail | memory `project_memory_agency_forget`; engine `routes.rs`\nHarness / tool calling | harness `CLAUDE.md`, `docs/SPEC-TOOL-CALLING.md`, `harness/`\nTiered conversation memory | harness `skills/conversation_memory.py`; this §5\nByte-exact / O_K | lattice `CONTRACT-BYTEEXACT-forward.md`; system `core/exact_islands/`\nXBAR / boundary thesis | lattice `CONTRACT-XBAR-*`; Position_Is_Arithmetic papers 18-24\nMEM-OKF format | lattice `papers/MEMORY-OKF-PROFILE.md`; `tools/okf_mem.py`\nRFC / Roadmap (current) | lattice `papers/PPT-LAT-RFC-001-*`, `PPT-LAT-Roadmap.md`\nPublic papers | Position_Is_Arithmetic `SERIES.md`, `papers/`, `LEDGER.md`\nHistorical (archived) | lattice `papers/Archived/`, Position_Is_Arithmetic `Archived/`\n\n## **12. State & open edges (honest)**\n\n**GREEN-LIVE:** byte-exact 12B; coherent served chat; autonomous recall + reject; the full memory agency (store/forget/decide/merge); the harness end-to-end (daemon, tool calling, python exec, memory-as-tools, the agency loop + heartbeat tick); tiered conversation memory + capabilities; the live consolidation hook. ~90% of the envisioned organism.\n\n**Open edges (next):** (1) **persistent O(1) conversation KV** — the daemon re-prefills the whole conversation each turn (correct but O(n)); the L1 stateful kvdecode verb can make “continue the cache” true O(1). (2) The external **two-physical-GPU** bit-identical check for byte-exact. (3) Deeper **faithfulness** — the model still leans on parametric priors over grounding; the tiered memory (reliable recall) is the structural answer, prompts are the patch. (4) Native-C port of the host-Python XBAR tooling; T4 Frobenius of the model weights (validated lever, untouched).\n\n**Recurring lesson, banked:** served-model misbehavior is almost always _ours_ (template / decode / sampler / forward / prompt), not the weights — verify vs llama.cpp + our PPL first. And for meta-cognitive model-calls: frame as **detection, not decision** , and force the answer prefix.\n\n* * *\n\n## **The paper series**\n\nA staggered set of short, independently citable, receipts-first papers — each carries its own one-command reproduction.\n\n * **25–30 — KEYSTONE: the organism, integrated** _(the milestone arc — the night the arches locked together, 2026-06-25; foundation`papers/PPT-LAT-KEYSTONE.md` in the lattice repo)_ — above the closed read/write/recall substrate, the served chat becomes a coherent **agent that owns its memory**. The arc, receipts attached:\n * **25 — The end-of-turn fix** — the served chat’s rambling / fake-turn confabulation was **ours, not a weak model** : the end-of-turn token reaches rank 1 at the boundary in our forward but loses by one, so it never stops; a logit bias on the stop tokens (`SP_EOT_BIAS≈4`) ends turns cleanly. The recurring lesson made a paper: served-model misbehavior is almost always the template / decode / sampler / forward, not the weights (engine `9e4b40f`).\n * **26 — Conversational faithfulness** — the chat was _not_ “restarting each turn” (the daemon carries the full conversation); the issue was the model leaning on **parametric priors over in-context grounding**. The fix is a default **system prompt** (identity + capabilities + “use the stated facts faithfully”) that makes it faithful, plus the structural answer — reliable tiered recall (engine `88d924e`).\n * **27 — Memory agency: forget, decide, merge** — the model **decides what it keeps** : STORE (NIGHTSHIFT capture) + **FORGET** (`SP_FORGET`: “forget X” → token-overlap match → drop + rewrite the registry) + **DECIDE/MERGE** (`SP_DECIDE`: a side model-call, framed as _detection_ with a forced answer prefix, supersedes a changed fact or consolidates two complementary facts into one synthesized truth). Gates `G-FORGET` / `G-DECIDE` / `G-MERGE` (engine `0fd52e4`). Default-off = null floor.\n * **28 — The deterministic judge** — the recall/reject **judge** is a **deterministic token-overlap (Jaccard) evidence verifier** , not a 26B model: a skeptical 12B proposal (TAG + EVIDENCE) → a Jaccard gate @≈0.6 → a confidence tiebreak. **N=40: recall 83% / reject 95%** — beating a 26B cascade (~53% / 98%) on an auditable CPU string op that frees the GPU. The 26B diffusion-judge cascade is retired (`G-JUDGE-BATTERY`).\n * **29 — The tool-calling harness** — the served model becomes an **agent** : ephemeral tool calling over the text-only daemon (the model emits `<tool name=\"…\">{json}</tool>`, the harness parses, executes, and feeds the result back in a ReAct loop), plus memory-as-tools and the tiered conversation memory. Live: `calculate` → 4183, `run_python` → 5050 (harness `G-HARNESS-TOOLCALL-E2E`).\n * **30 — KAIROS: the agency heartbeat** — the system _does things between turns_ : an idle-gated scheduler consolidates the live conversation (facts → mid, transcript → long) and runs a model-driven maintenance round (the model curates its own memory). The loop closes with **zero manual steps** (harness `run_agency.py`, `G-HARNESS-KAIROS-TICK` / `G-HARNESS-HOOK-E2E`).\n\n\n\n_KEYSTONE-1, 2026-06-25. Built by the operator (Knack) + Claude + Gemini. Receipts-first; honest negatives attached; default-off is the null floor. This is the foundation — build forward from here._\n\n> This, is just the first integration of 90% of the system, This has just been completed and is meant as the consolidation phase, The foundation on which to play and test idea’s. Tweak, Experiment, etc. Despite the loaded language this is nothing more than a living project, an experiment in how to build unique systems using current LLM’s/Agents. The system is real, the test’s, the results, the code is real. Everything else is just deliberatly loaded language. The real Project is the journey. The reciepts, The process. I am not claiming anything at all! Understand that before you accuse me of anything. I do know what I am doing, I am grounded. I just like to play things up. This is provided as ledger of how to work with current systems, how to build real working systems, how to test, refute, revise, rewrite.\n\n_This is not a paper. This is not a claim of a NEW system, This is a living project. There are real lessons in here, take what you like, use what you like. If you learn something or it helps you in anyway, or you just enjoy the journey, then my goal is achieved._",
"title": "Shannon Prime Lattice"
}