Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreia7ui5zkybmgdzghcyqp6a7fvzbkeafanaajfu7mcodhh34bnfwya",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mo33alfcwg22"
  },
  "path": "/t/removing-the-embedding-from-my-embedding-a-byte-transformer-with-a-0-parameter-input-layer-25m-single-rtx-4070/176731#post_1",
  "publishedAt": "2026-06-12T04:32:07.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "GitHub",
    "weights",
    "live demo",
    "weights public"
  ],
  "textContent": "Hi everyone, a follow-up — and a slightly absurd experiment that worked.\n\nSince the last post, the substrate ablation toolkit shipped inside the encoder (`hsl_embedding.ablation` — capacity-matched hsl / learned / random / permuted arms, as discussed in this thread). While running the full A/B I got curious about a stranger question:\n\n**what happens if I remove the embedding from my embedding?**\n\nI.e. feed the frozen 27-D signal features straight into the transformer through a fixed zero-pad — no tokenizer, no embedding table, no learned input projection. **Zero learned parameters at the door.**\n\nIt runs!!\n\n**input front door** | **text bpb** | **caption bpb** | **learned input params**\n---|---|---|---\nzero (frozen features, zero-pad) | 2.456 ±0.027 | 1.526 | **0**\nlearned projection on same features | 2.443 ±0.014 | 1.402 | ~125k\nplain learned byte embedding | 2.773 ±0.076 | 2.556 | ~132k\n\n(2 seeds, same lean ~25M body, same 3-modality byte mix, fixed 3000-step budget. Doubling bytes-per-slot (K=16, half the prefix positions) holds text bpb at 2.455.)\n\nReading this honestly: **not** “embeddings are beaten.” At this small budget the frozen substrate already carries what a learned front door would have to learn, and a plain learned byte embedding doesn’t get there in 3k steps — it may well close the gap with a longer schedule. One consumer GPU, small body, the table is the claim.\n\nSo I shipped it as a tiny package plus a live proof model:\n\n  * `pip install hsl-embedding-zero` — the zero door as a drop-in module (GitHub, MIT, DOI 10.5281/zenodo.20643551)\n  * **HoLo_ZeRo** — a 25M model trained entirely behind the zero door (the casing is the signal: HoLoZeRo = 10101010): weights · live demo (byte generation + the 27-D cosmos it literally reads)\n\n\n\nIf you’re curious, poke it and tell me where it breaks.\n\n(Also since last time: HoLo 6.5.1 finished its 3-stage curriculum — weights public, knowledge-grounding gap grew 0.001 → 1.835 across training, full numbers in the repo.)",
  "title": "Removing the embedding from my embedding: a byte transformer with a 0-parameter input layer (25M, single RTX 4070)"
}