External Publication

NEON-CITY/CosySim and the NEXUS project

Hugging Face Forums [Unofficial] June 16, 2026

CONTROL — How CosySim Trains and Governs Itself

The Oracle dashboard surfaces scheduler health, auto-loop cycles, and per-task timeout/error counts in real time.

Most AI demos are read-only: a model answers, you move on. CosySim’s CONTROL plane is the opposite. Every conversation, tool call, routing decision, and code edit becomes a training signal. A scheduler daemon wakes up on a cron-like cadence, checks whether enough new signal has accumulated, fine-tunes small local models on it, benchmarks the result against the incumbent, and promotes the winner — all on your own GPU, with no human in the loop and no data leaving the machine.

This is the part of the project most worth borrowing. It’s a working, end-to-end example of a local self-improvement loop : a data flywheel, a fine-tune orchestrator, an evaluation gate, an autonomous cycle controller, and an agent governor — wired together through a single scheduler.

The flywheel in one sentence: more interactions → richer datasets → better local models → better runtime behaviour → more interactions. See docs/TRAINING.md for the full pipeline walkthrough.

The five moving parts

Layer	Module	Role
Flywheel	`training/data_collector.py`, `engine/nexus/training_flywheel.py`	Capture every runtime event as a typed training example
Zoo	`training/model_zoo.py`	Single source of truth: 16 `ModelSpec` entries, each with its own dataset key, train threshold, and base model
Trainer	`training/finetune_orchestrator.py`, `training/auto_train.py`	QLoRA / Unsloth fine-tune jobs with queue, progress, checkpoint, auto-merge
Gate	`training/evaluation_gate.py`, `training/model_registry.py`	Benchmark before/after; promote only if quality holds or improves
Controller	`engine/nexus/auto_loop.py`, `engine/nexus/scheduler_daemon.py`	Closed-loop orchestration on a schedule; the AgentGovernor caps live agents

1. The DataCollector flywheel — learning from your own interactions

DataCollector (training/data_collector.py) is a thread-safe, non-blocking JSONL appender that runtime components call as they work. It writes per-type live files to training/datasets/collected/{model_type}_live.jsonl. Every typed signal has a dedicated capture method:

collector.collect_tool_call(user_input, tool_name, params, success=True)  # → tool_dispatch
collector.collect_grammar_error(bad_text, fixed_text, error_type="json")  # → grammar_scanner
collector.collect_output_rating(output, rating=4, source="feed")          # → output_evaluator
collector.collect_conversation(system_prompt, history, response, rating)  # → conversational
collector.collect_code(prompt, code, language="python")                   # → coder
collector.collect_agent_decision(...) / collect_agent_outcome(...)        # self-improvement loop

Failures here never crash the caller — each method is wrapped and logged through the Oracle, so the act of collecting training data can’t break the act of serving the user.

In parallel, TrainingFlywheel (engine/nexus/training_flywheel.py) harvests higher-level signal from the knowledge system — collect_from_qa, collect_from_nlm, collect_from_routing, collect_preference — into a SQLite-backed store with content-hash dedup, then exports in Alpaca, ShareGPT, or DPO formats (export_jsonl, export_sharegpt, export_dpo). The training-sync scheduler task drains Nexus Q&A into this store daily and auto-exports once 50+ unexported, quality-filtered examples accumulate.

2. The Model Zoo — one registry, many tiny specialists

MODEL_ZOO (training/model_zoo.py) is the declarative heart of the system: 16 ModelSpec entries, each declaring everything needed to train and evaluate one small specialist model.

"router_v3": ModelSpec(
    id="router_v3",
    base_model_alias="qwen-270m",        # Qwen2.5-0.5B-Instruct
    task_type="classification",
    dataset_key="router_v3",
    train_threshold=500,                  # auto-train fires at 500 collected examples
    collect_from=["agent_routing_events", "intent_labels"],
    auto_promote=True,
    priority=2,
)

The fleet spans evaluators (qa_evaluator, output_evaluator), classifiers (router_v2/v3, conversation_analyzer), structured-output models (tool_dispatch), detectors (grammar_scanner), and generators (syntax_fixer, knowledge_synthesizer, coder, conversational) — plus voice backends. The philosophy: don’t fine-tune one big model; train a swarm of cheap 270M–3B specialists that each do one job well and run locally in LMStudio. Base models are resolved through aliases (qwen-270m → Qwen/Qwen2.5-0.5B-Instruct, llama-3b → meta-llama/Llama-3.2-3B-Instruct).

3. The FinetuneOrchestrator — QLoRA jobs as first-class objects

FinetuneOrchestrator (training/finetune_orchestrator.py) manages the full job lifecycle as persisted FinetuneJob records (training/jobs.jsonl): PENDING → RUNNING → DONE/FAILED/CANCELLED, with live progress, step/loss parsing, best-loss tracking, and auto-merge of the LoRA adapter on success.

Rather than depend on a heavyweight training harness in-process, it generates a standalone, cross-platform Unsloth training script per job and runs it as a subprocess (configurable via COSYSIM_TRAIN_PYTHON or training.python_executable, honouring the project’s venv rule). Hyperparameters scale with model size via FinetuneConfig — a 270M model gets lora_r=8, batch_size=8; a 3B model gets lora_r=32, batch_size=2, seq_len=2048. On completion it notifies the ModelRegistry.

Router v3 retrain — the canonical full cycle (click for more details)

4. The evaluation gate — no degraded model ever gets promoted

A self-improving system that can’t tell better from worse will happily train itself into the ground. evaluation_gate.py is the safety valve. It benchmarks the candidate against the incumbent and applies an explicit GatePolicy:

Policy	Rule
`NO_REGRESSION`	candidate must score ≥ `threshold × baseline`
`MUST_IMPROVE`	a named metric must increase
`PARETO_DOMINANT`	candidate may not be dominated on any metric
`CUSTOM`	caller-supplied evaluation function

Per-type benchmark prompt suites (router, tag-extraction, response-validate, general) score accuracy, latency, and consistency over multiple runs. Only models that clear the gate reach ModelRegistry, which supports single-score auto_promote and multi-criteria Pareto promotion — and that registry is what LMStudio loads as the active model.

5. The AutoLoop — closing the loop without a human

AutoLoop (engine/nexus/auto_loop.py) is the controller that turns the parts above into an autonomous cycle. It registers five scheduler callbacks and records every run in a SQLite cycle ledger (data/auto_loop.db):

Cycle	Cadence	What it does
Experiment execution	`every_2h`	Runs the oldest PENDING experiment; one per cycle to keep load predictable
Eval sweep	`every_30m`	`OnlineEvaluator.auto_check()` — promote/rollback models past their thresholds
Training check	`every_4h`	`check_and_train_all_zoo()` — fine-tune any zoo model past its `train_threshold`
Impact assessment	`every_6h`	Finalize before/after impact snapshots, compute deltas
Full daily cycle	`daily`	All four in sequence → a Markdown Daily Improvement Report stored in Nexus

Each promotion, rollback, and training run is logged to the ImpactTracker, so the system keeps an auditable trail of what it changed about itself and what happened next. get_loop_status() exposes a health label (healthy / degraded / stalled) for the Oracle dashboard.

6. The scheduler — 90+ tasks, now with per-task timeouts

scheduler_daemon.py is a lightweight, cron-like daemon (not the agent task scheduler) that drives all of the above plus dozens of maintenance, knowledge, and content tasks — Nexus health, dedup, QA generation, news distillation, world-sim ticks, governance audits, model benchmarks, and the training tasks already described.

The v1.60.0 hardening pass is itself a good example of the project’s “fix the real problem” ethos. The original symptom: a hung external news fetch could block the entire scheduler loop for tens of seconds. The fix was structural, not a patch:

Per-task hard timeouts — every callback runs in a worker thread joined with a timeout; a hung task is abandoned (its daemon thread is detached, never blocking the loop) and recorded with a timeout_count. Default is configurable via scheduler.default_timeout_seconds; network-bound tasks like news-fetch get tighter caps.
Honest “not implemented” stubs — register_stub() / make_not_implemented() log one clear warning and return a sentinel that status records as not_implemented, instead of silently faking success and hiding missing functionality.
Non-blocking Nexus logging — task results are posted to Nexus on a fire-and-forget daemon thread that gives up immediately if Nexus is unreachable, so a down knowledge service can’t stall the loop it’s supposed to observe.

python -m engine.nexus.scheduler_daemon status # full task grid: next-due, run/error/timeout counts python -m engine.nexus.scheduler_daemon run # run one task now python -m training.auto_train --status # candidate counts vs thresholds python -m training.auto_train --dry-run # see what would train, train nothing

Governing the live agents — budgets, cooldowns, prerequisites

Self-improvement also means keeping the runtime agents in line. Every character reply flows through the AgentGovernor (engine/mcp/comms_framework.py), which wraps a CharacterAgent and enforces the full governance pipeline: build a ResponseContext, run auto-skills, run the 36-interceptor pre-call chain, call the LLM, parse tags, run the post-call chain.

Two governance mechanisms matter most for control:

InteractionPolicy caps each agent per scene — max_reply_tokens, tool_call_limit (rounds of tool calls per reply), tone/topic constraints, and in-character enforcement. Unset fields impose no constraint, so policies are additive.
Cooldowns + prerequisites (v1.59.0): the auto-skill path previously bypassed the registry’s throttling, so an auto skill could fire every single turn regardless of its declared cooldown. The governor now consults COOLDOWN_TRACKER.can_use() and checks that each skill’s prerequisites were actually used before invoking it — and marks usage only after a successful call.

The result is a system where the agents are budgeted and rate-limited turn by turn, the scheduler is timeout-bounded task by task, and the models themselves are gated promotion by promotion — three layers of control over a system designed to keep changing itself.

Deeper dives: docs/TRAINING.md (flywheel + fine-tuning), docs/MCP_FRAMEWORK.md (governor + interceptor pipeline), docs/OPERATIONS.md (running the daemons), docs/NEXUS.md (knowledge flywheel inputs).

Integrations, Apps & CLI

NEONOS — the CosySim system surface where engine integrations, apps, and CLI converge

CosySim runs on local inference , but it does not run in a vacuum. The same engine that powers 35 scenes also exposes a deep integration layer (engine/integrations/), a fleet of standalone apps (apps/*.py), and a single unified CLI (cli.py). Everything reuses the same engine singletons, the same account pool, and the same secure config — so a HAR you capture in the browser, a Colab GPU you rent for free, and a NotebookLM notebook you distill all become first-class inputs to your local agents.

This is the part of the project most worth borrowing from: it is a worked example of how to wire cloud frontier models and local models into one coherent system without leaking a single secret into the repo.

Deep dives live in docs/INTEGRATIONS_SDK.md and docs/APPS.md. Per-service protocol specs are in the *_API_REFERENCE.md files.

The Integration Suite (`engine/integrations/`)

Each integration is a typed Python client that authenticates with session cookies from a shared account pool (or an env-supplied API key) and speaks the service’s real wire protocol — batchexecute, gRPC-web, or REST — reverse-engineered from HAR captures and V8 heap snapshots with ARGUS. No vendor SDK lock-in, no browser automation in the hot path.

Domain	Module(s)	What it enables
GitHub Copilot	`github_copilot_client.py`	Chat + model listing against the Copilot Individual API (38 frontier models — Claude, GPT, Gemini) via a GitHub browser session → short-lived Bearer token. Powers `cli.py ask` and the proxy.
NotebookLM	`nlm_direct_client.py`, `notebooklm_sdk.py`, `nlm_rpc_registry.py`	Multi-turn grounded notebook chat, source ingest (text/URL/YouTube/image/audio/video/PDF), audio overviews, flashcards, mind maps, export-to-Sheets. The SDK wraps 37 rpcids + 24 gRPC methods with full docstrings — built for agents.
Gemini (consumer + Labs)	`gemini_direct_client.py`, `gemini_extended_client.py`, `aistudio_client.py`, `appcatalyst_client.py`, `opal_client.py`	Direct Gemini chat (`batchexecute`), AI Studio MakerSuite (136 methods, structured JSON output), AppCatalyst REST access to Gemini 3 Flash Preview , and Opal creative workspace.
Managed RAG & caching	`file_search_client.py`, `context_cache_client.py`	Google AI File Search — persistent doc/code stores with grounded citations, distilled back to local Nexus (“Google is the teacher, NEXUS is the student”). Context Cache reuses 50K±token prefixes (`CLAUDE.md` + context) across calls.
Workspace	`google_drive_client.py`, `gsheets_client.py`, `google_docs_client.py`, `appscript_client.py`, `gas_client.py`, `workspace_gemini_client.py`	Drive upload/download/permissions, Sheets v4 CRUD, Docs create/export + Gemini content gen, Apps Script project/code/execution control, and the Gemini features embedded inside Workspace apps.
Colab (free GPU)	`colab_client.py`, `colab_gpu_manager.py`, `colab_venv_manager.py`, `colab_notebook_builder.py`, `colab_tunnel_server.py`	Drive a Colab runtime as a remote compute backend: AI Agent tasks, kernel exec over WebSocket, venv/notebook provisioning, and an ngrok tunnel server exposing the GPU as an inference endpoint.
Compute routing	`compute_router.py`	Unifies Colab tunnels, the Colab AI agent, and LMStudio behind one inference interface — tracks per-account quotas and tiers, falls back gracefully.
Account & auth plumbing	`google_account_pool.py`, `github_account_importer.py`, `har_parser.py`, `har_extractor.py`, `rpcid_updater.py`, `rpc_proxy.py`	Round-robin multi-account cookie pool, HAR → pool import, and a live `rpcid` updater so rotated Google RPC IDs self-heal from the YAML registry.
Other	`google_aim_client.py`, `homeassistant.py`, `anythingllm.py`, `artifact_bus.py`	Google AI Mode (`udm=50`) search threads, Home Assistant control, AnythingLLM bridge, and a cross-service artifact bus.

Secure by construction

Secrets never touch the repo. Clients read keys from os.environ (e.g. appcatalyst_client.py resolves APPCATALYST_API_KEY / GOOGLE_API_KEY, aistudio_client.py loads a rotating key list from GOOGLE_AISTUDIO_KEYS) and cookies from a gitignored pool. The repo ships only structure :

.env.example          # committed — shows the shape, no real values
.env / .env.local     # gitignored
config/secrets.yaml   # gitignored; *.example.* committed
data/accounts/pool.json, data/credentials/, **/client_secret*.json  # gitignored




# .gitignore — v1.61.0: "never commit real values"
.env*
config/secrets.yaml
data/credentials/
**/*credentials*.json
**/client_secret*.json

See docs/CONFIGURATION.md for the full secret layout.

**Standalone Apps (`apps/*.py`)**

Every major subsystem has a thin, self-contained CLI entry point. They share apps/_bootstrap.py, which auto-re-execs into.venv/Scripts/python.exe (no manual activation), puts the project root on sys.path, and sets the CWD — then forwards to the engine. The apps are facades: the real logic lives in engine/, so an app and its in-process callers always behave identically.

App	Purpose
`apps/nexus.py`	Nexus KMS — search, ask, add knowledge, sessions, NLM (docs/NEXUS.md)
`apps/argus.py`	Web-app recon — HAR/heap mining, bundle decompile, CDP scripting (docs/ARGUS.md)
`apps/lmstudio.py`	Local LLM status, model list, quick inference, benchmark
`apps/oracle.py`	System diagnostics — health, error aggregation, traces, perf
`apps/ask.py`	Unified query router → Copilot (38 models) / NotebookLM / LMStudio
`apps/filestore.py`	Gemini File Search managed RAG — store CRUD, upload, query
`apps/training.py`	Dataset + fine-tuning pipeline, benchmarks, live-traffic curation
`apps/cdp.py`, `apps/har.py`, `apps/heap.py`	Chrome DevTools, HAR, and V8 heap toolkits
`apps/account.py`, `apps/launch.py`, `apps/cleanup.py`, `apps/test.py`	Account pool, scene launcher, disk cleanup, smart test runner

Multi-protocol AI gateway

Two proxy servers turn the whole stack into an OpenAI/Anthropic/Gemini-compatible endpoint — point any existing tool at it and get frontier models:

apps/multi_proxy.py → scripts/model_proxy_direct.py on :5801 — zero-conversion : each protocol serializes straight to/from the Copilot backend with no intermediate format (≈7× faster). OpenAI, Anthropic, and Gemini request shapes are all served natively, including tool-call parsing.
apps/proxy.py → on :5800 — the original normalized gateway.

python apps/multi_proxy.py --default opus --list-models # serve all 3 protocols on :5801

The Unified CLI (`cli.py`)

cli.py is the front door — 16 commands in four groups, each routing to a script, module, or app via the venv. Run it from anywhere; it handles the environment for you.

  AI & Models:   ask  nlm  nexus  filestore  proxy
  Analysis:      argus  har  heap  cdp
  Operations:    oracle  test  scene  launch  cleanup
  Accounts:      account




python cli.py ask "Explain the interceptor pipeline"     # → Copilot / NLM / local
python cli.py nexus search "economy ticks"               # local knowledge base
python cli.py filestore bootstrap-all                     # Gemini managed RAG over the codebase
python cli.py account import github.har                   # HAR cookies → account pool
python cli.py argus har capture.har --report             # deep API recon
python cli.py oracle --errors                             # what's broken, ranked

How a command reaches the engine (click for more details)

The throughline across all three layers: one engine, many faces. A cookie captured by cli.py account, a notebook seeded by apps/nexus.py, and a GPU tunnel opened by compute_router are equally available to a Flask scene, a skill, or your own script — which is exactly what makes this a useful reference implementation for agentic, local-first systems.