Infra Monorepo Migration
Eva Winterschön
May 31, 2026
One weekend of fun follows many former, and also weekdays, and other weekends. It's nice to have a small team manage a lot of resources using a single pane of glass via tmux, rs232, ipmi, Ansible and BigNetwork.
More details later... sleepy.
# NeoCortex Standards, Repo Storage, ZFS Dataset, M70 QAT Fast-Track Plan
## Summary
Implement a non-destructive NeoCortex follow-up that locks
these policies into code, tests, and docs:
- Repos live under /opt/repos/<repo>; worktrees live under
/opt/repos/worktrees/<repo>/<branch>.
- /root is not a repo/worktree destination; /root/OMGFUCKED is
temporary quarantine only.
- New repos must be created from
yukon-systems/YukonSYS-Standard-Definitions.
- Every repo must use Git LFS and the standards repo
.gitattributes byte-for-byte.
- ZFS dataset creation for /opt/repos becomes reusable
Ansible/policy logic, based on
/root/bin/ZFS-M70-Forge_zroot-dataset_opt-repos.sh.
- M70 QAT rollout uses canary first: build OpenSSL+QAT and
OpenZFS/QAT binpkgs, validate on M70 canary, then promote.
## Key Changes
- Standards and repo-placement policy
- Update NeoCortex root policy docs and tests to forbid
new repo/worktree paths under /root, except documented
temporary quarantine paths under /root/OMGFUCKED.
- Update tools/import-source-repos.py defaults from
/root/<repo> to /opt/repos/<repo>, with
/root/OMGFUCKED/<repo> accepted only as an explicit
transitional override.
- Add a repo-location audit script/test that fails active
operational docs/scripts using /root/<repo> or
/root/.config/superpowers/worktrees as live paths.
- Git LFS and .gitattributes
- Replace NeoCortex root .gitattributes with the standards
repo version byte-for-byte.
- Add a validation script that checks:
- git lfs version works,
- filter.lfs.required=true,
- origin LFS locksverify=true,
- root .gitattributes hash equals standards source,
- LFS pre-push hook exists.
- Document that non-compliant repos must be retrofitted
before migration/import.
- ZFS dataset automation
- Convert
/root/bin/ZFS-M70-Forge_zroot-dataset_opt-repos.sh into
reusable NeoCortex Ansible/defaults for managed
datasets.
- First managed dataset profile: zroot/opt/repos mounted
at /opt/repos with the script’s options, including
compression=zstd-fast-1, dedup=sha256,verify,
recordsize=256K, xattr=sa, POSIX ACLs, atime=off, and
reservation=32M.
- Add a second reusable pattern for per-user home
datasets, but do not block on the NAS/NFS 777
investigation; include a permission preflight that
creates a test directory with umask 0027 and fails if
mode becomes 0777.
- M70 QAT binpkg fast-track
- Extend the existing M70 canary stage4/binpkg lane to
build a QAT package set:
- QAT firmware/kernel module readiness package policy,
- OpenSSL QAT engine/provider package path,
- OpenZFS package built with QAT support,
- rollback stock OpenSSL/OpenZFS binpkgs.
- Add package/use/profile fragments under the existing
Gentoo stage4 profile area instead of ad hoc host edits.
- Add canary validation:
- lspci -nnk sees QAT device,
- qat_c3xxx and intel_qat load,
- OpenZFS module parameters expose QAT controls,
- openssl speed before/after artifacts are captured,
- root pool imports and mounts after reboot,
- rollback binpkg path is tested before wider
promotion.
- Do not promote to all M70 nodes until canary evidence is
committed.
## Test Plan
- Local static checks:
- tests/shell/run-tests.sh
- new repo-location standards test
- new Git LFS standards test
- new ZFS dataset defaults test
- new M70 QAT package-policy test
- YAML parse for all workflows/playbooks
- git diff --check
- Runtime/dry-run checks:
- Ansible syntax-check for dataset roles/playbooks.
- Dry-run dataset plan for /opt/repos confirms current
zroot/opt/repos matches expected options.
- M70 canary dry-run emits QAT binpkg build plan without
mutating packages.
- Canary acceptance:
- QAT binpkgs build successfully.
- M70 canary reboots with root pool intact.
- QAT modules and OpenZFS QAT controls are present.
- Benchmark and rollback evidence are recorded under
NeoCortex audit docs.
## Assumptions
- Canonical repo clone layout is /opt/repos/<repo>.
- Worktree layout is /opt/repos/worktrees/<repo>/<branch>.
- /root/OMGFUCKED remains temporary quarantine, not an
approved steady-state repo location.
- QAT rollout gate is canary first, not all-M70 immediate.
- OpenZFS QAT implementation details must be verified against
current OpenZFS/Gentoo package behavior during
implementation; source basis includes OpenZFS QAT docs and
module-parameter docs:
- https://openzfs.org/wiki/ZFS_Hardware_Acceleration_with_QAT
- https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html
─ Worked for 22m 41s
───────────────────────────────────────────────────────────
Implement this plan?
1. Yes, implement this plan Switch to Default and
start coding.
2. Yes, clear context and implement Fresh thread. Context:
53% used.
› 3. No, stay in Plan mode Continue planning with
the model.
-----------------------
## Assumptions
- Canonical repo clone layout is /opt/repos/<repo>.
- Worktree layout is /opt/repos/worktrees/<repo>/<branch>.
- /root/OMGFUCKED remains temporary quarantine, not an
approved steady-state repo location.
- QAT rollout gate is canary first, not all-M70 immediate.
- OpenZFS QAT implementation details must be verified against
current OpenZFS/Gentoo package behavior during
implementation; source basis includes OpenZFS
QAT docs and module-parameter docs:
- https://openzfs.org/wiki/ZFS_Hardware_Acceleration_with_QAT
- https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html
─ Worked for 22m 41s
───────────────────────────────────────────────────────────
› 1) We also need to include the control plane fixes and
validations so that mutli-workers comms are fixed. 2) We need to
leverage the SLURM cluster nodes and
additional distcc build hosts, 3) X12again & FMT2 are
effectively idle and can be blasted into submission to fast
track all of these changes. 4) Once the
M70-canary passes, we need to roll it out to K10 (tear-it-down
AAA) and validate that type of OS netbooting. 5) Add those
aspects to the Plans and we're
looking better. 6) Finally, wrap up ALL changes necessary to
complete the NeoCortex monorepo migration.
───────────────────────────────────────────────────────────
• I’ll revise the plan around the expanded end-state, but first
I’m doing a read-only pass over NeoCortex’s control-plane,
SLURM/distcc, X12/FMT2/K10, and
migration-status artifacts so the replacement plan is grounded
instead of generic.
───────────────────────────────────────────────────────────
• Proposed Plan
# NeoCortex Standards-Complete Monorepo Migration Plan
## Summary
- Make /opt/repos/<repo> the only canonical repo location; no
new checkouts or random repo files under any home directory,
including /root.
- Finish standards adoption: all new repos must originate from
YukonSYS-Standard-Definitions; NeoCortex gets retrofitted
instead of restarted unless
retrofit verification fails.
- Complete NeoCortex monorepo migration with standards baseline,
Git LFS policy, control-plane comms fixes, ZFS dataset policy,
SLURM/distcc acceleration, M70 QAT canary, FMT2/X12again build
capacity, and K10 netboot AAA rollout validation.
## Key Changes
- Repository standards
- Merge standards policy PR work, then make NeoCortex
root .gitattributes byte-for-byte identical to the standards
repo version.
- Update active NeoCortex import/migration tooling to default
source paths to /opt/repos/<repo>.
- Preserve historical /root/... references only in immutable
audit/provenance docs; active scripts, runbooks, tests, and
templates must not depend on
home-directory repos.
- Use worktrees under /opt/repos/worktrees/<repo>/<branch>.
- NeoCortex monorepo completion
- Finish all migration blockers already tracked in
NeoCortex: M70 stage4/QCOW validation, Kata/container
runtime, FMT2 ephemeral VM lane, Gentoo/OpenRC
replacement for temporary Rocky bootstrap, Thor replacement
path, and optional history graft/freeze decision.
- Treat NeoCortex as authoritative after completion: source
repos become frozen compatibility references or are
history-grafted into NeoCortex with
documented provenance.
- Add/repair migration validation so PR completion proves
standards baseline, Git LFS, repo location policy, shell
tests, YAML parsing, and migration
manifests.
- Control-plane and multi-worker comms
- Bring all control-plane fixes into NeoCortex and validate
Forge/FCP worker communication end to end.
- Add tests for FCP envelope validation, replay/nonce handling,
approval ceilings, routing scope, transport policy, ntfy bridge
behavior, and multi-worker request/response flows.
- Replace temporary Rocky bootstrap control-plane host with the
compliant Gentoo/OpenRC image once that image passes validation.
- ZFS, NFS, and storage policy
- Use the provided /opt/repos ZFS dataset options as the canonical
dataset profile for repo storage.
- Add reusable dataset/playbook logic for repo datasets and
per-user home datasets.
- Add NFS/floating-home validation that detects the NAS NFSv4.2
0777 directory-mode bug under expected 0027 umask and blocks
insecure rollout.
- Build acceleration and hardware rollout
- Use SLURM plus distcc for M70 stage4/QAT/OpenZFS builds.
- Treat X12again and FMT2 as available fast-track build/validation
capacity.
- Build binpkg path for OpenSSL + QAT + OpenZFS on M70 canary first;
non-QAT nodes may consume the same binpkgs with QAT unused.
- After M70 canary passes, roll the validated path to K10 and
run destructive AAA/netboot validation with rollback evidence
captured.
## Test Plan
- Run NeoCortex shell/YAML validation, including migration,
standards-baseline, Git LFS, .gitattributes, and no-active-home-repo-path
checks.
- Run control-plane/FCP tests for envelope schema, replay rejection,
worker routing, ntfy bridge, and multi-worker comms.
- Run Ansible/playbook syntax checks for ZFS datasets, NFS home policy,
SLURM/distcc build orchestration, M70 canary, FMT2/X12again lanes, and
K10 netboot
AAA.
- Execute staged hardware validation:
1. SLURM/distcc dry-run.
2. M70 canary build and QCOW/stage validation.
3. FMT2/X12again accelerated build/ephemeral VM smoke.
4. K10 netboot/AAA destructive validation after M70 success.
## Assumptions
- Retrofit NeoCortex rather than restart the repo unless standards
validation exposes unrecoverable divergence.
- Historical audit files may retain old /root/... paths as provenance;
active operational content may not.
- M70 canary is the release gate for QAT/OpenZFS/binpkg rollout.
- K10 destructive AAA/netboot validation is in scope only after M70
canary passes and rollback evidence is captured.
─ Worked for 2m 18s ───────────────────────────────────────────────────────────
Discussion in the ATmosphere