Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreih35sqxumjpetmwcptbdtfofthpero5uld6jhnnwgjxkatc5g5d2m",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mofktl26xyx2"
  },
  "path": "/t/neon-city-cosysim-and-the-nexus-project/176853#post_3",
  "publishedAt": "2026-06-16T08:41:30.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "docs/TRAINING.md",
    "(click for more details)",
    "docs/MCP_FRAMEWORK.md",
    "docs/OPERATIONS.md",
    "docs/NEXUS.md",
    "docs/INTEGRATIONS_SDK.md",
    "docs/APPS.md",
    "ARGUS",
    "docs/CONFIGURATION.md",
    "docs/ARGUS.md"
  ],
  "textContent": "## **CONTROL — How CosySim Trains and Governs Itself**\n\n_The Oracle dashboard surfaces scheduler health, auto-loop cycles, and per-task timeout/error counts in real time._\n\nMost AI demos are read-only: a model answers, you move on. CosySim’s **CONTROL plane** is the opposite. Every conversation, tool call, routing decision, and code edit becomes a _training signal_. A scheduler daemon wakes up on a cron-like cadence, checks whether enough new signal has accumulated, fine-tunes small local models on it, benchmarks the result against the incumbent, and promotes the winner — all on your own GPU, with no human in the loop and no data leaving the machine.\n\nThis is the part of the project most worth borrowing. It’s a working, end-to-end example of a **local self-improvement loop** : a data flywheel, a fine-tune orchestrator, an evaluation gate, an autonomous cycle controller, and an agent governor — wired together through a single scheduler.\n\n> The flywheel in one sentence: **more interactions → richer datasets → better local models → better runtime behaviour → more interactions.** See docs/TRAINING.md for the full pipeline walkthrough.\n\n* * *\n\n### **The five moving parts**\n\n**Layer** | **Module** | **Role**\n---|---|---\n**Flywheel** | `training/data_collector.py`, `engine/nexus/training_flywheel.py` | Capture every runtime event as a typed training example\n**Zoo** | `training/model_zoo.py` | Single source of truth: 16 `ModelSpec` entries, each with its own dataset key, train threshold, and base model\n**Trainer** | `training/finetune_orchestrator.py`, `training/auto_train.py` | QLoRA / Unsloth fine-tune jobs with queue, progress, checkpoint, auto-merge\n**Gate** | `training/evaluation_gate.py`, `training/model_registry.py` | Benchmark before/after; promote only if quality holds or improves\n**Controller** | `engine/nexus/auto_loop.py`, `engine/nexus/scheduler_daemon.py` | Closed-loop orchestration on a schedule; the AgentGovernor caps live agents\n\n* * *\n\n### **1. The DataCollector flywheel — learning from your own interactions**\n\n`DataCollector` (`training/data_collector.py`) is a thread-safe, non-blocking JSONL appender that runtime components call as they work. It writes per-type live files to `training/datasets/collected/{model_type}_live.jsonl`. Every typed signal has a dedicated capture method:\n\n\n    collector.collect_tool_call(user_input, tool_name, params, success=True)  # → tool_dispatch\n    collector.collect_grammar_error(bad_text, fixed_text, error_type=\"json\")  # → grammar_scanner\n    collector.collect_output_rating(output, rating=4, source=\"feed\")          # → output_evaluator\n    collector.collect_conversation(system_prompt, history, response, rating)  # → conversational\n    collector.collect_code(prompt, code, language=\"python\")                   # → coder\n    collector.collect_agent_decision(...) / collect_agent_outcome(...)        # self-improvement loop\n\n\nFailures here never crash the caller — each method is wrapped and logged through the Oracle, so the act of _collecting training data_ can’t break the act of _serving the user_.\n\nIn parallel, `TrainingFlywheel` (`engine/nexus/training_flywheel.py`) harvests higher-level signal from the knowledge system — `collect_from_qa`, `collect_from_nlm`, `collect_from_routing`, `collect_preference` — into a SQLite-backed store with content-hash dedup, then exports in **Alpaca, ShareGPT, or DPO** formats (`export_jsonl`, `export_sharegpt`, `export_dpo`). The `training-sync` scheduler task drains Nexus Q&A into this store daily and auto-exports once 50+ unexported, quality-filtered examples accumulate.\n\n### **2. The Model Zoo — one registry, many tiny specialists**\n\n`MODEL_ZOO` (`training/model_zoo.py`) is the declarative heart of the system: 16 `ModelSpec` entries, each declaring everything needed to train and evaluate one small specialist model.\n\n\n    \"router_v3\": ModelSpec(\n        id=\"router_v3\",\n        base_model_alias=\"qwen-270m\",        # Qwen2.5-0.5B-Instruct\n        task_type=\"classification\",\n        dataset_key=\"router_v3\",\n        train_threshold=500,                  # auto-train fires at 500 collected examples\n        collect_from=[\"agent_routing_events\", \"intent_labels\"],\n        auto_promote=True,\n        priority=2,\n    )\n\n\nThe fleet spans evaluators (`qa_evaluator`, `output_evaluator`), classifiers (`router_v2/v3`, `conversation_analyzer`), structured-output models (`tool_dispatch`), detectors (`grammar_scanner`), and generators (`syntax_fixer`, `knowledge_synthesizer`, `coder`, `conversational`) — plus voice backends. The philosophy: **don’t fine-tune one big model; train a swarm of cheap 270M–3B specialists** that each do one job well and run locally in LMStudio. Base models are resolved through aliases (`qwen-270m → Qwen/Qwen2.5-0.5B-Instruct`, `llama-3b → meta-llama/Llama-3.2-3B-Instruct`).\n\n### **3. The FinetuneOrchestrator — QLoRA jobs as first-class objects**\n\n`FinetuneOrchestrator` (`training/finetune_orchestrator.py`) manages the full job lifecycle as persisted `FinetuneJob` records (`training/jobs.jsonl`): `PENDING → RUNNING → DONE/FAILED/CANCELLED`, with live progress, step/loss parsing, best-loss tracking, and auto-merge of the LoRA adapter on success.\n\nRather than depend on a heavyweight training harness in-process, it **generates a standalone, cross-platform Unsloth training script per job** and runs it as a subprocess (configurable via `COSYSIM_TRAIN_PYTHON` or `training.python_executable`, honouring the project’s venv rule). Hyperparameters scale with model size via `FinetuneConfig` — a 270M model gets `lora_r=8, batch_size=8`; a 3B model gets `lora_r=32, batch_size=2, seq_len=2048`. On completion it notifies the `ModelRegistry`.\n\nRouter v3 retrain — the canonical full cycle (click for more details)\n\n### **4. The evaluation gate — no degraded model ever gets promoted**\n\nA self-improving system that can’t tell better from worse will happily train itself into the ground. `evaluation_gate.py` is the safety valve. It benchmarks the candidate against the incumbent and applies an explicit `GatePolicy`:\n\n**Policy** | **Rule**\n---|---\n`NO_REGRESSION` | candidate must score ≥ `threshold × baseline`\n`MUST_IMPROVE` | a named metric must increase\n`PARETO_DOMINANT` | candidate may not be dominated on _any_ metric\n`CUSTOM` | caller-supplied evaluation function\n\nPer-type benchmark prompt suites (router, tag-extraction, response-validate, general) score `accuracy`, `latency`, and `consistency` over multiple runs. Only models that clear the gate reach `ModelRegistry`, which supports single-score `auto_promote` and multi-criteria Pareto promotion — and that registry is what LMStudio loads as the active model.\n\n### **5. The AutoLoop — closing the loop without a human**\n\n`AutoLoop` (`engine/nexus/auto_loop.py`) is the controller that turns the parts above into an autonomous cycle. It registers five scheduler callbacks and records every run in a SQLite cycle ledger (`data/auto_loop.db`):\n\n**Cycle** | **Cadence** | **What it does**\n---|---|---\nExperiment execution | `every_2h` | Runs the oldest PENDING experiment; one per cycle to keep load predictable\nEval sweep | `every_30m` | `OnlineEvaluator.auto_check()` — promote/rollback models past their thresholds\nTraining check | `every_4h` | `check_and_train_all_zoo()` — fine-tune any zoo model past its `train_threshold`\nImpact assessment | `every_6h` | Finalize before/after impact snapshots, compute deltas\n**Full daily cycle** | `daily` | All four in sequence → a Markdown **Daily Improvement Report** stored in Nexus\n\nEach promotion, rollback, and training run is logged to the `ImpactTracker`, so the system keeps an auditable trail of _what it changed about itself and what happened next_. `get_loop_status()` exposes a health label (`healthy / degraded / stalled`) for the Oracle dashboard.\n\n### **6. The scheduler — 90+ tasks, now with per-task timeouts**\n\n`scheduler_daemon.py` is a lightweight, cron-like daemon (not the agent task scheduler) that drives all of the above plus dozens of maintenance, knowledge, and content tasks — Nexus health, dedup, QA generation, news distillation, world-sim ticks, governance audits, model benchmarks, and the training tasks already described.\n\nThe **v1.60.0 hardening pass** is itself a good example of the project’s “fix the real problem” ethos. The original symptom: a hung external news fetch could block the entire scheduler loop for tens of seconds. The fix was structural, not a patch:\n\n  * **Per-task hard timeouts** — every callback runs in a worker thread joined with a timeout; a hung task is _abandoned_ (its daemon thread is detached, never blocking the loop) and recorded with a `timeout_count`. Default is configurable via `scheduler.default_timeout_seconds`; network-bound tasks like `news-fetch` get tighter caps.\n  * **Honest “not implemented” stubs** — `register_stub()` / `make_not_implemented()` log one clear warning and return a sentinel that status records as `not_implemented`, instead of silently faking success and hiding missing functionality.\n  * **Non-blocking Nexus logging** — task results are posted to Nexus on a fire-and-forget daemon thread that gives up immediately if Nexus is unreachable, so a down knowledge service can’t stall the loop it’s supposed to observe.\n\n\n\n\n    python -m engine.nexus.scheduler_daemon status      # full task grid: next-due, run/error/timeout counts\n    python -m engine.nexus.scheduler_daemon run <id>    # run one task now\n    python -m training.auto_train --status              # candidate counts vs thresholds\n    python -m training.auto_train --dry-run             # see what would train, train nothing\n\n\n* * *\n\n### **Governing the live agents — budgets, cooldowns, prerequisites**\n\nSelf-improvement also means keeping the _runtime_ agents in line. Every character reply flows through the **`AgentGovernor`** (`engine/mcp/comms_framework.py`), which wraps a `CharacterAgent` and enforces the full governance pipeline: build a `ResponseContext`, run auto-skills, run the 36-interceptor pre-call chain, call the LLM, parse tags, run the post-call chain.\n\nTwo governance mechanisms matter most for control:\n\n  * **`InteractionPolicy`** caps each agent per scene — `max_reply_tokens`, `tool_call_limit` (rounds of tool calls per reply), tone/topic constraints, and in-character enforcement. Unset fields impose no constraint, so policies are additive.\n  * **Cooldowns + prerequisites** (v1.59.0): the auto-skill path previously bypassed the registry’s throttling, so an auto skill could fire _every single turn_ regardless of its declared `cooldown`. The governor now consults `COOLDOWN_TRACKER.can_use()` and checks that each skill’s `prerequisites` were actually used before invoking it — and marks usage only after a successful call.\n\n\n\nThe result is a system where the _agents_ are budgeted and rate-limited turn by turn, the _scheduler_ is timeout-bounded task by task, and the _models themselves_ are gated promotion by promotion — three layers of control over a system designed to keep changing itself.\n\n> Deeper dives: docs/TRAINING.md (flywheel + fine-tuning), docs/MCP_FRAMEWORK.md (governor + interceptor pipeline), docs/OPERATIONS.md (running the daemons), docs/NEXUS.md (knowledge flywheel inputs).\n\n* * *\n\n## **Integrations, Apps & CLI**\n\n_NEONOS — the CosySim system surface where engine integrations, apps, and CLI converge_\n\nCosySim runs on **local inference** , but it does not run in a vacuum. The same engine that powers 35 scenes also exposes a deep integration layer (`engine/integrations/`), a fleet of standalone apps (`apps/*.py`), and a single unified CLI (`cli.py`). Everything reuses the same engine singletons, the same account pool, and the same secure config — so a HAR you capture in the browser, a Colab GPU you rent for free, and a NotebookLM notebook you distill all become first-class inputs to your local agents.\n\nThis is the part of the project most worth borrowing from: it is a worked example of how to wire **cloud frontier models and local models into one coherent system** without leaking a single secret into the repo.\n\n> Deep dives live in docs/INTEGRATIONS_SDK.md and docs/APPS.md. Per-service protocol specs are in the `*_API_REFERENCE.md` files.\n\n* * *\n\n### **The Integration Suite (`engine/integrations/`)**\n\nEach integration is a typed Python client that authenticates with **session cookies from a shared account pool** (or an env-supplied API key) and speaks the service’s real wire protocol — `batchexecute`, gRPC-web, or REST — reverse-engineered from HAR captures and V8 heap snapshots with ARGUS. No vendor SDK lock-in, no browser automation in the hot path.\n\n**Domain** | **Module(s)** | **What it enables**\n---|---|---\n**GitHub Copilot** | `github_copilot_client.py` | Chat + model listing against the Copilot Individual API (38 frontier models — Claude, GPT, Gemini) via a GitHub browser session → short-lived Bearer token. Powers `cli.py ask` and the proxy.\n**NotebookLM** | `nlm_direct_client.py`, `notebooklm_sdk.py`, `nlm_rpc_registry.py` | Multi-turn grounded notebook chat, source ingest (text/URL/YouTube/image/audio/video/PDF), audio overviews, flashcards, mind maps, export-to-Sheets. The SDK wraps 37 rpcids + 24 gRPC methods with full docstrings — built for agents.\n**Gemini (consumer + Labs)** | `gemini_direct_client.py`, `gemini_extended_client.py`, `aistudio_client.py`, `appcatalyst_client.py`, `opal_client.py` | Direct Gemini chat (`batchexecute`), AI Studio MakerSuite (136 methods, structured JSON output), AppCatalyst REST access to **Gemini 3 Flash Preview** , and Opal creative workspace.\n**Managed RAG & caching** | `file_search_client.py`, `context_cache_client.py` | Google AI **File Search** — persistent doc/code stores with grounded citations, distilled back to local Nexus (“Google is the teacher, NEXUS is the student”). Context Cache reuses 50K±token prefixes (`CLAUDE.md` + context) across calls.\n**Workspace** | `google_drive_client.py`, `gsheets_client.py`, `google_docs_client.py`, `appscript_client.py`, `gas_client.py`, `workspace_gemini_client.py` | Drive upload/download/permissions, Sheets v4 CRUD, Docs create/export + Gemini content gen, Apps Script project/code/execution control, and the Gemini features embedded inside Workspace apps.\n**Colab (free GPU)** | `colab_client.py`, `colab_gpu_manager.py`, `colab_venv_manager.py`, `colab_notebook_builder.py`, `colab_tunnel_server.py` | Drive a Colab runtime as a remote compute backend: AI Agent tasks, kernel exec over WebSocket, venv/notebook provisioning, and an ngrok tunnel server exposing the GPU as an inference endpoint.\n**Compute routing** | `compute_router.py` | Unifies Colab tunnels, the Colab AI agent, and LMStudio behind one inference interface — tracks per-account quotas and tiers, falls back gracefully.\n**Account & auth plumbing** | `google_account_pool.py`, `github_account_importer.py`, `har_parser.py`, `har_extractor.py`, `rpcid_updater.py`, `rpc_proxy.py` | Round-robin multi-account cookie pool, HAR → pool import, and a live `rpcid` updater so rotated Google RPC IDs self-heal from the YAML registry.\n**Other** | `google_aim_client.py`, `homeassistant.py`, `anythingllm.py`, `artifact_bus.py` | Google AI Mode (`udm=50`) search threads, Home Assistant control, AnythingLLM bridge, and a cross-service artifact bus.\n\n#### **Secure by construction**\n\nSecrets never touch the repo. Clients read keys from `os.environ` (e.g. `appcatalyst_client.py` resolves `APPCATALYST_API_KEY` / `GOOGLE_API_KEY`, `aistudio_client.py` loads a rotating key list from `GOOGLE_AISTUDIO_KEYS`) and cookies from a gitignored pool. The repo ships **only structure** :\n\n\n    .env.example          # committed — shows the shape, no real values\n    .env / .env.local     # gitignored\n    config/secrets.yaml   # gitignored; *.example.* committed\n    data/accounts/pool.json, data/credentials/, **/client_secret*.json  # gitignored\n\n\n\n\n    # .gitignore — v1.61.0: \"never commit real values\"\n    .env*\n    config/secrets.yaml\n    data/credentials/\n    **/*credentials*.json\n    **/client_secret*.json\n\n\nSee docs/CONFIGURATION.md for the full secret layout.\n\n* * *\n\n### **Standalone Apps (`apps/*.py`)**\n\nEvery major subsystem has a thin, self-contained CLI entry point. They share `apps/_bootstrap.py`, which **auto-re-execs into`.venv/Scripts/python.exe`** (no manual activation), puts the project root on `sys.path`, and sets the CWD — then forwards to the engine. The apps are facades: the real logic lives in `engine/`, so an app and its in-process callers always behave identically.\n\n**App** | **Purpose**\n---|---\n`apps/nexus.py` | Nexus KMS — search, ask, add knowledge, sessions, NLM (docs/NEXUS.md)\n`apps/argus.py` | Web-app recon — HAR/heap mining, bundle decompile, CDP scripting (docs/ARGUS.md)\n`apps/lmstudio.py` | Local LLM status, model list, quick inference, benchmark\n`apps/oracle.py` | System diagnostics — health, error aggregation, traces, perf\n`apps/ask.py` | Unified query router → Copilot (38 models) / NotebookLM / LMStudio\n`apps/filestore.py` | Gemini File Search managed RAG — store CRUD, upload, query\n`apps/training.py` | Dataset + fine-tuning pipeline, benchmarks, live-traffic curation\n`apps/cdp.py`, `apps/har.py`, `apps/heap.py` | Chrome DevTools, HAR, and V8 heap toolkits\n`apps/account.py`, `apps/launch.py`, `apps/cleanup.py`, `apps/test.py` | Account pool, scene launcher, disk cleanup, smart test runner\n\n#### **Multi-protocol AI gateway**\n\nTwo proxy servers turn the whole stack into an **OpenAI/Anthropic/Gemini-compatible endpoint** — point any existing tool at it and get frontier models:\n\n  * `apps/multi_proxy.py` → `scripts/model_proxy_direct.py` on **:5801** — _zero-conversion_ : each protocol serializes straight to/from the Copilot backend with no intermediate format (≈7× faster). OpenAI, Anthropic, and Gemini request shapes are all served natively, including tool-call parsing.\n  * `apps/proxy.py` → on **:5800** — the original _normalized_ gateway.\n\n\n\n\n    python apps/multi_proxy.py --default opus --list-models   # serve all 3 protocols on :5801\n\n\n* * *\n\n### **The Unified CLI (`cli.py`)**\n\n`cli.py` is the front door — **16 commands** in four groups, each routing to a script, module, or app via the venv. Run it from anywhere; it handles the environment for you.\n\n\n      AI & Models:   ask  nlm  nexus  filestore  proxy\n      Analysis:      argus  har  heap  cdp\n      Operations:    oracle  test  scene  launch  cleanup\n      Accounts:      account\n\n\n\n\n    python cli.py ask \"Explain the interceptor pipeline\"     # → Copilot / NLM / local\n    python cli.py nexus search \"economy ticks\"               # local knowledge base\n    python cli.py filestore bootstrap-all                     # Gemini managed RAG over the codebase\n    python cli.py account import github.har                   # HAR cookies → account pool\n    python cli.py argus har capture.har --report             # deep API recon\n    python cli.py oracle --errors                             # what's broken, ranked\n\n\nHow a command reaches the engine (click for more details)\n\nThe throughline across all three layers: **one engine, many faces.** A cookie captured by `cli.py account`, a notebook seeded by `apps/nexus.py`, and a GPU tunnel opened by `compute_router` are equally available to a Flask scene, a skill, or your own script — which is exactly what makes this a useful reference implementation for agentic, local-first systems.\n\n* * *",
  "title": "NEON-CITY/CosySim and the NEXUS project"
}