External Publication
Visit Post

Infra Monorepo Migration

Eva Winterschön May 31, 2026
Source

One weekend of fun follows many former, and also weekdays, and other weekends. It's nice to have a small team manage a lot of resources using a single pane of glass via tmux, rs232, ipmi, Ansible and BigNetwork.

More details later... sleepy.


 # NeoCortex Standards, Repo Storage, ZFS Dataset, M70 QAT Fast-Track Plan

  ## Summary

  Implement a non-destructive NeoCortex follow-up that locks
  these policies into code, tests, and docs:

  - Repos live under /opt/repos/<repo>; worktrees live under
    /opt/repos/worktrees/<repo>/<branch>.
  - /root is not a repo/worktree destination; /root/OMGFUCKED is
    temporary quarantine only.
  - New repos must be created from
    yukon-systems/YukonSYS-Standard-Definitions.
  - Every repo must use Git LFS and the standards repo
    .gitattributes byte-for-byte.
  - ZFS dataset creation for /opt/repos becomes reusable
    Ansible/policy logic, based on
    /root/bin/ZFS-M70-Forge_zroot-dataset_opt-repos.sh.
  - M70 QAT rollout uses canary first: build OpenSSL+QAT and
    OpenZFS/QAT binpkgs, validate on M70 canary, then promote.

  ## Key Changes

  - Standards and repo-placement policy
      - Update NeoCortex root policy docs and tests to forbid
        new repo/worktree paths under /root, except documented
        temporary quarantine paths under /root/OMGFUCKED.
      - Update tools/import-source-repos.py defaults from
        /root/<repo> to /opt/repos/<repo>, with
        /root/OMGFUCKED/<repo> accepted only as an explicit
        transitional override.
      - Add a repo-location audit script/test that fails active
        operational docs/scripts using /root/<repo> or
        /root/.config/superpowers/worktrees as live paths.
  - Git LFS and .gitattributes
      - Replace NeoCortex root .gitattributes with the standards
        repo version byte-for-byte.
      - Add a validation script that checks:
          - git lfs version works,
          - filter.lfs.required=true,
          - origin LFS locksverify=true,
          - root .gitattributes hash equals standards source,
          - LFS pre-push hook exists.
      - Document that non-compliant repos must be retrofitted
        before migration/import.
  - ZFS dataset automation
      - Convert
        /root/bin/ZFS-M70-Forge_zroot-dataset_opt-repos.sh into
        reusable NeoCortex Ansible/defaults for managed
        datasets.
      - First managed dataset profile: zroot/opt/repos mounted
        at /opt/repos with the script’s options, including
        compression=zstd-fast-1, dedup=sha256,verify,
        recordsize=256K, xattr=sa, POSIX ACLs, atime=off, and
        reservation=32M.
      - Add a second reusable pattern for per-user home
        datasets, but do not block on the NAS/NFS 777
        investigation; include a permission preflight that
        creates a test directory with umask 0027 and fails if
        mode becomes 0777.
  - M70 QAT binpkg fast-track
      - Extend the existing M70 canary stage4/binpkg lane to
        build a QAT package set:
          - QAT firmware/kernel module readiness package policy,
          - OpenSSL QAT engine/provider package path,
          - OpenZFS package built with QAT support,
          - rollback stock OpenSSL/OpenZFS binpkgs.
      - Add package/use/profile fragments under the existing
        Gentoo stage4 profile area instead of ad hoc host edits.
      - Add canary validation:
          - lspci -nnk sees QAT device,
          - qat_c3xxx and intel_qat load,
          - OpenZFS module parameters expose QAT controls,
          - openssl speed before/after artifacts are captured,
          - root pool imports and mounts after reboot,
          - rollback binpkg path is tested before wider
            promotion.
      - Do not promote to all M70 nodes until canary evidence is
        committed.

  ## Test Plan

  - Local static checks:
      - tests/shell/run-tests.sh
      - new repo-location standards test
      - new Git LFS standards test
      - new ZFS dataset defaults test
      - new M70 QAT package-policy test
      - YAML parse for all workflows/playbooks
      - git diff --check
  - Runtime/dry-run checks:
      - Ansible syntax-check for dataset roles/playbooks.
      - Dry-run dataset plan for /opt/repos confirms current
        zroot/opt/repos matches expected options.
      - M70 canary dry-run emits QAT binpkg build plan without
        mutating packages.
  - Canary acceptance:
      - QAT binpkgs build successfully.
      - M70 canary reboots with root pool intact.
      - QAT modules and OpenZFS QAT controls are present.
      - Benchmark and rollback evidence are recorded under
        NeoCortex audit docs.


  ## Assumptions

  - Canonical repo clone layout is /opt/repos/<repo>.
  - Worktree layout is /opt/repos/worktrees/<repo>/<branch>.
  - /root/OMGFUCKED remains temporary quarantine, not an
    approved steady-state repo location.
  - QAT rollout gate is canary first, not all-M70 immediate.
  - OpenZFS QAT implementation details must be verified against
    current OpenZFS/Gentoo package behavior during
    implementation; source basis includes OpenZFS QAT docs and
    module-parameter docs:
      - https://openzfs.org/wiki/ZFS_Hardware_Acceleration_with_QAT
      - https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html

─ Worked for 22m 41s
───────────────────────────────────────────────────────────
  Implement this plan?
  1. Yes, implement this plan          Switch to Default and
  start coding.
  2. Yes, clear context and implement  Fresh thread. Context:
  53% used.
› 3. No, stay in Plan mode             Continue planning with
the model.



-----------------------
  ## Assumptions

  - Canonical repo clone layout is /opt/repos/<repo>.
  - Worktree layout is /opt/repos/worktrees/<repo>/<branch>.
  - /root/OMGFUCKED remains temporary quarantine, not an
    approved steady-state repo location.
  - QAT rollout gate is canary first, not all-M70 immediate.
  - OpenZFS QAT implementation details must be verified against
    current OpenZFS/Gentoo package behavior during
    implementation; source basis includes OpenZFS
    QAT docs and module-parameter docs:
      - https://openzfs.org/wiki/ZFS_Hardware_Acceleration_with_QAT
      - https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html

─ Worked for 22m 41s
───────────────────────────────────────────────────────────

› 1) We also need to include the control plane fixes and
validations so that mutli-workers comms are fixed. 2) We need to
leverage the SLURM cluster nodes and
  additional distcc build hosts, 3) X12again & FMT2 are
  effectively idle and can be blasted into submission to fast
  track all of these changes. 4) Once the
  M70-canary passes, we need to roll it out to K10 (tear-it-down
  AAA) and validate that type of OS netbooting. 5) Add those
  aspects to the Plans and we're
  looking better. 6) Finally, wrap up ALL changes necessary to
  complete the NeoCortex monorepo migration.

───────────────────────────────────────────────────────────

• I’ll revise the plan around the expanded end-state, but first
I’m doing a read-only pass over NeoCortex’s control-plane,
SLURM/distcc, X12/FMT2/K10, and
  migration-status artifacts so the replacement plan is grounded
  instead of generic.

───────────────────────────────────────────────────────────

• Proposed Plan

  # NeoCortex Standards-Complete Monorepo Migration Plan

  ## Summary

  - Make /opt/repos/<repo> the only canonical repo location; no
  new checkouts or random repo files under any home directory,
  including /root.
  - Finish standards adoption: all new repos must originate from
  YukonSYS-Standard-Definitions; NeoCortex gets retrofitted
  instead of restarted unless
    retrofit verification fails.
  - Complete NeoCortex monorepo migration with standards baseline,
  Git LFS policy, control-plane comms fixes, ZFS dataset policy,
  SLURM/distcc acceleration, M70 QAT canary, FMT2/X12again build
  capacity, and K10 netboot AAA rollout validation.

  ## Key Changes

  - Repository standards
      - Merge standards policy PR work, then make NeoCortex
      root .gitattributes byte-for-byte identical to the standards
      repo version.
      - Update active NeoCortex import/migration tooling to default
      source paths to /opt/repos/<repo>.
      - Preserve historical /root/... references only in immutable
      audit/provenance docs; active scripts, runbooks, tests, and
      templates must not depend on
        home-directory repos.
      - Use worktrees under /opt/repos/worktrees/<repo>/<branch>.
  - NeoCortex monorepo completion
      - Finish all migration blockers already tracked in
      NeoCortex: M70 stage4/QCOW validation, Kata/container
      runtime, FMT2 ephemeral VM lane, Gentoo/OpenRC
        replacement for temporary Rocky bootstrap, Thor replacement
        path, and optional history graft/freeze decision.
      - Treat NeoCortex as authoritative after completion: source
      repos become frozen compatibility references or are
      history-grafted into NeoCortex with
        documented provenance.
      - Add/repair migration validation so PR completion proves
      standards baseline, Git LFS, repo location policy, shell
      tests, YAML parsing, and migration
        manifests.
  - Control-plane and multi-worker comms
      - Bring all control-plane fixes into NeoCortex and validate
      Forge/FCP worker communication end to end.
      - Add tests for FCP envelope validation, replay/nonce handling,
      approval ceilings, routing scope, transport policy, ntfy bridge
      behavior, and multi-worker request/response flows.
      - Replace temporary Rocky bootstrap control-plane host with the
      compliant Gentoo/OpenRC image once that image passes validation.
  - ZFS, NFS, and storage policy
      - Use the provided /opt/repos ZFS dataset options as the canonical
      dataset profile for repo storage.
      - Add reusable dataset/playbook logic for repo datasets and
      per-user home datasets.
      - Add NFS/floating-home validation that detects the NAS NFSv4.2
      0777 directory-mode bug under expected 0027 umask and blocks
      insecure rollout.
  - Build acceleration and hardware rollout
      - Use SLURM plus distcc for M70 stage4/QAT/OpenZFS builds.
      - Treat X12again and FMT2 as available fast-track build/validation
      capacity.
      - Build binpkg path for OpenSSL + QAT + OpenZFS on M70 canary first;
      non-QAT nodes may consume the same binpkgs with QAT unused.
      - After M70 canary passes, roll the validated path to K10 and
      run destructive AAA/netboot validation with rollback evidence
      captured.

  ## Test Plan

  - Run NeoCortex shell/YAML validation, including migration,
  standards-baseline, Git LFS, .gitattributes, and no-active-home-repo-path
  checks.
  - Run control-plane/FCP tests for envelope schema, replay rejection,
  worker routing, ntfy bridge, and multi-worker comms.
  - Run Ansible/playbook syntax checks for ZFS datasets, NFS home policy,
  SLURM/distcc build orchestration, M70 canary, FMT2/X12again lanes, and
  K10 netboot
    AAA.
  - Execute staged hardware validation:
      1. SLURM/distcc dry-run.
      2. M70 canary build and QCOW/stage validation.
      3. FMT2/X12again accelerated build/ephemeral VM smoke.
      4. K10 netboot/AAA destructive validation after M70 success.

  ## Assumptions

  - Retrofit NeoCortex rather than restart the repo unless standards
  validation exposes unrecoverable divergence.
  - Historical audit files may retain old /root/... paths as provenance;
  active operational content may not.
  - M70 canary is the release gate for QAT/OpenZFS/binpkg rollout.
  - K10 destructive AAA/netboot validation is in scope only after M70
  canary passes and rollback evidence is captured.

─ Worked for 2m 18s ───────────────────────────────────────────────────────────

Discussion in the ATmosphere

Loading comments...