Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiefasjxmv57wyy2wn5lskhwvwon4ens2kp4kp4cvfouqfj4ks7sjy",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmvr6oga4yp2"
  },
  "path": "/t/attention-is-all-we-had-but-not-what-we-needed-language-generation-without-attention-via-iterative-energy-based-state-refinement/176285#post_1",
  "publishedAt": "2026-05-28T09:40:27.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation"
  ],
  "textContent": "We introduce CSM (Convergent State Machine) — a language model with zero attention layers that uses energy-based iterative state refinement over 16 state vectors. Key results: - 66M and 150M models, zero attention anywhere - 150M matches GPT-2 1.5B on MMLU within 0.3% (10x fewer params, 13x less data) - Perplexity decreases monotonically with more iterations - State dynamics scale with model size (66M → iter 15, 150M → iter 30+) - Total training cost: under $50\nPaper: Attention Is All We Had — But Not What We Needed: Convergent State Machine for Iterative Energy-Based Language Generation",
  "title": "Attention Is All We Had — But Not What We Needed: Language generation without attention via iterative energy-based state refinement"
}