External Publication
Visit Post

HoLo/HSL: a 100M change-rate-based multimodal toy model on a single RTX 4070

Hugging Face Forums [Unofficial] June 10, 2026
Source

Thank you for this review — it is exactly the kind of feedback this project needs, and we are adopting your framing more or less wholesale. The layer separation you propose (byte-native / dense packing / HSL substrate / architecture / binding / reproducibility) is now literally our experiment plan. A few things can be answered today; the rest we are committing to with a concrete roadmap.

One factual correction first

“the original byte-to-signal encoder/codec is withheld”

This premise is no longer true (and mostly never was): the encoder is fully publicpip install hsl-embedding, MIT license, formulas + tests + codec included. The only private artifacts are the trained HoLo model weights , which are work-in-progress. So every encoder-side claim is directly reproducible from PyPI today; no surrogate substrate is needed. (Your “which results require the private encoder” table item resolves to: none of the substrate-side ones.)

Shipped today, in direct response to this review (v0.5.0)

The package now contains hsl_embedding/ablation.py so that anyone — including you — can run the controlled comparisons you asked for with a one-line swap:

  • ControlEmbedding(kind, seed) with four variants sharing the identical 27-D layout and the same 9 context dims: hsl (bit-identical to the real encoder, test-enforced) / learned (trainable byte projection, +4,608 params — your “learned byte projection” arm) / random (seeded random injective LUT, moment-matched per channel — your “random fixed invertible map” arm) / permuted (HSL’s own 256 LUT rows shuffled — per-channel distributions exactly identical, only the value-adjacency geometry is destroyed; we think this is a sharper control than channel shuffling for the capacity-vs-geometry question).
  • feature_groups() / select_channels() for the feature-family ablations (drop Δ², drop FFT, …), plus a value(18)/context(9) split.
  • value_lut() — the frozen 256×18 table exported as a tensor (your “exported feature tensors” item).

What we can already state (with honest caveats)

Structural facts (verifiable from the public package):

  • The per-byte Δ is exactly the binary-reflected Gray code v ^ (v >> 1) — adjacent byte values differ in exactly one Δ coordinate (raw bits: up to 8). This is the mathematical content behind the “change-rate” framing, and it connects the substrate to the minimal-change-encoding literature.
  • The 27-D base decomposes as a frozen 256×18 value LUT + 9 context dims (Δ², boundary). So the substrate question reduces cleanly to: does this particular frozen embedding geometry beat a learned/random/permuted one at matched everything? (Linear rank of the value dims is 17/18 — one dependency, dxor0 lies in the FFT span — so we will not claim “18 independent channels”.)
  • One channel-scale caveat we will control for: fft_re0 (the DC term) spans 0–8 while other channels are ±1–2, so input-normalization placement will be held identical across all ablation arms.
  • The FFT dims are the spectrum of the bit pattern of each byte , not a temporal spectrum of the waveform — your documentation-clarity point is correct and the docs now say so.

Preliminary measurements (small scale, single-seed, prior encoder revision — to be re-run multi-seed with the ablation kit before we treat them as findings):

  • Same decoder-only architecture, same data/steps: 27-D HSL input 2.058 bpb vs learned byte embedding 2.118 bpb on a byte-level LM task. This is the single number that most needs the multi-seed matched-baseline treatment you describe, and it is first in the queue.
  • Architecture axis at matched data/budget: decoder-only prefix-LM (~11M params) outperformed an encoder–decoder twice its size (2.227 vs 2.275 bpb) across all depths tested.
  • Mechanism (not quality) results for the disk-offload tier: with the answer present only in a disk-resident value, retrieval-ON reaches 1.000 task accuracy while ablated retrieval and no-memory controls sit at chance — the read mechanism is load-bearing, not decorative.

Claim table (current, your format)

Claim Evidence today Caveat Next test
Byte-native pipeline runs end-to-end text/chat/knowledge/video(539B windows) through one trainer works ≠ quality; generation demo pending trained ckpt fixed demo + small checkpoint
HSL substrate is useful beyond learned bytes 2.058 vs 2.118 bpb (matched arch/data) single seed, toy scale, prior encoder rev multi-seed ControlEmbedding A/B (hsl/learned/random/permuted)
Substrate geometry (not just invertibility) matters Gray-code structure; permuted control exists unmeasured the raw-bits(8) vs Δ(8) minimal pair — identical information/dims/scale, geometry only
Dense prefix + byte-AR decoder helps dec-only beat 2× enc-dec at matched budget params/context confounds partially addressed, not fully same-param/-context/-FLOP grid
Cross-modal binding matched/mismatched gaps in earlier prototypes shortcuts not excluded hard negatives (same-class wrong-instance, entropy-matched) + top-k retrieval
Knowledge lives on disk, not FFN ON 1.000 / ablated chance mechanism proof, synthetic facts knowledge-mode training on a real 73k-fact store (wired, training next)

What is running / next

A depth sweep on the final wired architecture is finishing now; the first full training run on the pinned public encoder follows, then: (1) the multi-seed substrate ablations above, (2) binding probes with your hard-negative list, (3) a small reproducibility packet — fixed tiny split, exact commands, seeds/logs, a small checkpoint, and the claim/evidence/caveat table maintained in the repo.

If you want to poke at the substrate before any of that lands: pip install hsl-embedding (>= 0.5.0), then from hsl_embedding.ablation import ControlEmbedding — the four variants are a one-line swap in any byte-LM training loop. A complete runnable comparison (examples/substrate_ablation.py) ships in the source distribution on PyPI. Thanks again — the review measurably improved the package within a day of being posted.

Discussion in the ATmosphere

Loading comments...