{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidz3ghdluuwy72lwisan7e3bvg5735r6644mpankb7pilvrgdnpoi",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mnwuzxvgdkb2"
},
"path": "/t/holo-hsl-a-100m-change-rate-based-multimodal-toy-model-on-a-single-rtx-4070/176599#post_3",
"publishedAt": "2026-06-10T13:04:51.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "Thank you for this review — it is exactly the kind of feedback this project needs, and we are adopting your framing more or less wholesale. The layer separation you propose (byte-native / dense packing / HSL substrate / architecture / binding / reproducibility) is now literally our experiment plan. A few things can be answered today; the rest we are committing to with a concrete roadmap.\n\n## **One factual correction first**\n\n> “the original byte-to-signal encoder/codec is withheld”\n\nThis premise is no longer true (and mostly never was): **the encoder is fully public** — `pip install hsl-embedding`, MIT license, formulas + tests + codec included. The only private artifacts are the _trained HoLo model weights_ , which are work-in-progress. So every encoder-side claim is directly reproducible from PyPI today; no surrogate substrate is needed. (Your “which results require the private encoder” table item resolves to: none of the substrate-side ones.)\n\n## **Shipped today, in direct response to this review (v0.5.0)**\n\nThe package now contains `hsl_embedding/ablation.py` so that anyone — including you — can run the controlled comparisons you asked for with a one-line swap:\n\n * `ControlEmbedding(kind, seed)` with four variants sharing the identical 27-D layout and the same 9 context dims: **`hsl`** (bit-identical to the real encoder, test-enforced) / **`learned`** (trainable byte projection, +4,608 params — your “learned byte projection” arm) / **`random`** (seeded random injective LUT, moment-matched per channel — your “random fixed invertible map” arm) / **`permuted`** (HSL’s own 256 LUT rows shuffled — per-channel distributions _exactly_ identical, only the value-adjacency **geometry** is destroyed; we think this is a sharper control than channel shuffling for the capacity-vs-geometry question).\n * `feature_groups()` / `select_channels()` for the feature-family ablations (drop Δ², drop FFT, …), plus a `value(18)/context(9)` split.\n * `value_lut()` — the frozen 256×18 table exported as a tensor (your “exported feature tensors” item).\n\n\n\n## **What we can already state (with honest caveats)**\n\n**Structural facts** (verifiable from the public package):\n\n * The per-byte Δ is exactly the **binary-reflected Gray code** `v ^ (v >> 1)` — adjacent byte values differ in exactly one Δ coordinate (raw bits: up to 8). This is the mathematical content behind the “change-rate” framing, and it connects the substrate to the minimal-change-encoding literature.\n * The 27-D base decomposes as a **frozen 256×18 value LUT + 9 context dims** (Δ², boundary). So the substrate question reduces cleanly to: does this _particular_ frozen embedding geometry beat a learned/random/permuted one at matched everything? (Linear rank of the value dims is 17/18 — one dependency, dxor0 lies in the FFT span — so we will not claim “18 independent channels”.)\n * One channel-scale caveat we will control for: `fft_re0` (the DC term) spans 0–8 while other channels are ±1–2, so input-normalization placement will be held identical across all ablation arms.\n * The FFT dims are the spectrum of the **bit pattern of each byte** , not a temporal spectrum of the waveform — your documentation-clarity point is correct and the docs now say so.\n\n\n\n**Preliminary measurements** (small scale, single-seed, prior encoder revision — to be re-run multi-seed with the ablation kit before we treat them as findings):\n\n * Same decoder-only architecture, same data/steps: 27-D HSL input 2.058 bpb vs learned byte embedding 2.118 bpb on a byte-level LM task. This is the single number that most needs the multi-seed matched-baseline treatment you describe, and it is first in the queue.\n * Architecture axis at matched data/budget: decoder-only prefix-LM (~11M params) outperformed an encoder–decoder twice its size (2.227 vs 2.275 bpb) across all depths tested.\n * Mechanism (not quality) results for the disk-offload tier: with the answer present only in a disk-resident value, retrieval-ON reaches 1.000 task accuracy while ablated retrieval and no-memory controls sit at chance — the read mechanism is load-bearing, not decorative.\n\n\n\n## **Claim table (current, your format)**\n\n**Claim** | **Evidence today** | **Caveat** | **Next test**\n---|---|---|---\nByte-native pipeline runs end-to-end | text/chat/knowledge/video(539B windows) through one trainer | works ≠ quality; generation demo pending trained ckpt | fixed demo + small checkpoint\nHSL substrate is useful beyond learned bytes | 2.058 vs 2.118 bpb (matched arch/data) | single seed, toy scale, prior encoder rev | multi-seed ControlEmbedding A/B (hsl/learned/random/permuted)\nSubstrate geometry (not just invertibility) matters | Gray-code structure; permuted control exists | **unmeasured** | the raw-bits(8) vs Δ(8) minimal pair — identical information/dims/scale, geometry only\nDense prefix + byte-AR decoder helps | dec-only beat 2× enc-dec at matched budget | params/context confounds partially addressed, not fully | same-param/-context/-FLOP grid\nCross-modal binding | matched/mismatched gaps in earlier prototypes | shortcuts not excluded | hard negatives (same-class wrong-instance, entropy-matched) + top-k retrieval\nKnowledge lives on disk, not FFN | ON 1.000 / ablated chance | mechanism proof, synthetic facts | knowledge-mode training on a real 73k-fact store (wired, training next)\n\n## **What is running / next**\n\nA depth sweep on the final wired architecture is finishing now; the first full training run on the pinned public encoder follows, then: (1) the multi-seed substrate ablations above, (2) binding probes with your hard-negative list, (3) a small reproducibility packet — fixed tiny split, exact commands, seeds/logs, a small checkpoint, and the claim/evidence/caveat table maintained in the repo.\n\nIf you want to poke at the substrate before any of that lands: `pip install hsl-embedding` (>= 0.5.0), then `from hsl_embedding.ablation import ControlEmbedding` — the four variants are a one-line swap in any byte-LM training loop. A complete runnable comparison (`examples/substrate_ablation.py`) ships in the source distribution on PyPI. Thanks again — the review measurably improved the package within a day of being posted.",
"title": "HoLo/HSL: a 100M change-rate-based multimodal toy model on a single RTX 4070"
}