External Publication

HoLo/HSL: a 100M change-rate-based multimodal toy model on a single RTX 4070

Hugging Face Forums [Unofficial] June 10, 2026

Thank you for this review — it is exactly the kind of feedback this project needs, and we are adopting your framing more or less wholesale. The layer separation you propose (byte-native / dense packing / HSL substrate / architecture / binding / reproducibility) is now literally our experiment plan. A few things can be answered today; the rest we are committing to with a concrete roadmap.

One factual correction first

“the original byte-to-signal encoder/codec is withheld”

This premise is no longer true (and mostly never was): the encoder is fully public — pip install hsl-embedding, MIT license, formulas + tests + codec included. The only private artifacts are the trained HoLo model weights , which are work-in-progress. So every encoder-side claim is directly reproducible from PyPI today; no surrogate substrate is needed. (Your “which results require the private encoder” table item resolves to: none of the substrate-side ones.)

Shipped today, in direct response to this review (v0.5.0)

The package now contains hsl_embedding/ablation.py so that anyone — including you — can run the controlled comparisons you asked for with a one-line swap:

ControlEmbedding(kind, seed) with four variants sharing the identical 27-D layout and the same 9 context dims: hsl (bit-identical to the real encoder, test-enforced) / learned (trainable byte projection, +4,608 params — your “learned byte projection” arm) / random (seeded random injective LUT, moment-matched per channel — your “random fixed invertible map” arm) / permuted (HSL’s own 256 LUT rows shuffled — per-channel distributions exactly identical, only the value-adjacency geometry is destroyed; we think this is a sharper control than channel shuffling for the capacity-vs-geometry question).
feature_groups() / select_channels() for the feature-family ablations (drop Δ², drop FFT, …), plus a value(18)/context(9) split.
value_lut() — the frozen 256×18 table exported as a tensor (your “exported feature tensors” item).

What we can already state (with honest caveats)

Structural facts (verifiable from the public package):

The per-byte Δ is exactly the binary-reflected Gray code v ^ (v >> 1) — adjacent byte values differ in exactly one Δ coordinate (raw bits: up to 8). This is the mathematical content behind the “change-rate” framing, and it connects the substrate to the minimal-change-encoding literature.
The 27-D base decomposes as a frozen 256×18 value LUT + 9 context dims (Δ², boundary). So the substrate question reduces cleanly to: does this particular frozen embedding geometry beat a learned/random/permuted one at matched everything? (Linear rank of the value dims is 17/18 — one dependency, dxor0 lies in the FFT span — so we will not claim “18 independent channels”.)
One channel-scale caveat we will control for: fft_re0 (the DC term) spans 0–8 while other channels are ±1–2, so input-normalization placement will be held identical across all ablation arms.
The FFT dims are the spectrum of the bit pattern of each byte , not a temporal spectrum of the waveform — your documentation-clarity point is correct and the docs now say so.

Preliminary measurements (small scale, single-seed, prior encoder revision — to be re-run multi-seed with the ablation kit before we treat them as findings):

Same decoder-only architecture, same data/steps: 27-D HSL input 2.058 bpb vs learned byte embedding 2.118 bpb on a byte-level LM task. This is the single number that most needs the multi-seed matched-baseline treatment you describe, and it is first in the queue.
Architecture axis at matched data/budget: decoder-only prefix-LM (~11M params) outperformed an encoder–decoder twice its size (2.227 vs 2.275 bpb) across all depths tested.
Mechanism (not quality) results for the disk-offload tier: with the answer present only in a disk-resident value, retrieval-ON reaches 1.000 task accuracy while ablated retrieval and no-memory controls sit at chance — the read mechanism is load-bearing, not decorative.

Claim table (current, your format)

Claim	Evidence today	Caveat	Next test
Byte-native pipeline runs end-to-end	text/chat/knowledge/video(539B windows) through one trainer	works ≠ quality; generation demo pending trained ckpt	fixed demo + small checkpoint
HSL substrate is useful beyond learned bytes	2.058 vs 2.118 bpb (matched arch/data)	single seed, toy scale, prior encoder rev	multi-seed ControlEmbedding A/B (hsl/learned/random/permuted)
Substrate geometry (not just invertibility) matters	Gray-code structure; permuted control exists	unmeasured	the raw-bits(8) vs Δ(8) minimal pair — identical information/dims/scale, geometry only
Dense prefix + byte-AR decoder helps	dec-only beat 2× enc-dec at matched budget	params/context confounds partially addressed, not fully	same-param/-context/-FLOP grid
Cross-modal binding	matched/mismatched gaps in earlier prototypes	shortcuts not excluded	hard negatives (same-class wrong-instance, entropy-matched) + top-k retrieval
Knowledge lives on disk, not FFN	ON 1.000 / ablated chance	mechanism proof, synthetic facts	knowledge-mode training on a real 73k-fact store (wired, training next)

What is running / next

A depth sweep on the final wired architecture is finishing now; the first full training run on the pinned public encoder follows, then: (1) the multi-seed substrate ablations above, (2) binding probes with your hard-negative list, (3) a small reproducibility packet — fixed tiny split, exact commands, seeds/logs, a small checkpoint, and the claim/evidence/caveat table maintained in the repo.

If you want to poke at the substrate before any of that lands: pip install hsl-embedding (>= 0.5.0), then from hsl_embedding.ablation import ControlEmbedding — the four variants are a one-line swap in any byte-LM training loop. A complete runnable comparison (examples/substrate_ablation.py) ships in the source distribution on PyPI. Thanks again — the review measurably improved the package within a day of being posted.

One factual correction first

Shipped today, in direct response to this review (v0.5.0)

What we can already state (with honest caveats)

Claim table (current, your format)

What is running / next

Discussion in the ATmosphere