Removing the embedding from my embedding: a byte transformer with a 0-parameter input layer (25M, single RTX 4070)
Hi everyone, a follow-up — and a slightly absurd experiment that worked.
Since the last post, the substrate ablation toolkit shipped inside the encoder (hsl_embedding.ablation — capacity-matched hsl / learned / random / permuted arms, as discussed in this thread). While running the full A/B I got curious about a stranger question:
what happens if I remove the embedding from my embedding?
I.e. feed the frozen 27-D signal features straight into the transformer through a fixed zero-pad — no tokenizer, no embedding table, no learned input projection. Zero learned parameters at the door.
It runs!!
| input front door | text bpb | caption bpb | learned input params |
|---|---|---|---|
| zero (frozen features, zero-pad) | 2.456 ±0.027 | 1.526 | 0 |
| learned projection on same features | 2.443 ±0.014 | 1.402 | ~125k |
| plain learned byte embedding | 2.773 ±0.076 | 2.556 | ~132k |
(2 seeds, same lean ~25M body, same 3-modality byte mix, fixed 3000-step budget. Doubling bytes-per-slot (K=16, half the prefix positions) holds text bpb at 2.455.)
Reading this honestly: not “embeddings are beaten.” At this small budget the frozen substrate already carries what a learned front door would have to learn, and a plain learned byte embedding doesn’t get there in 3k steps — it may well close the gap with a longer schedule. One consumer GPU, small body, the table is the claim.
So I shipped it as a tiny package plus a live proof model:
pip install hsl-embedding-zero— the zero door as a drop-in module (GitHub, MIT, DOI 10.5281/zenodo.20643551)- HoLo_ZeRo — a 25M model trained entirely behind the zero door (the casing is the signal: HoLoZeRo = 10101010): weights · live demo (byte generation + the 27-D cosmos it literally reads)
If you’re curious, poke it and tell me where it breaks.
(Also since last time: HoLo 6.5.1 finished its 3-stage curriculum — weights public, knowledge-grounding gap grew 0.001 → 1.835 across training, full numbers in the repo.)
Discussion in the ATmosphere