Attention Is All We Had — But Not What We Needed: Language generation without attention via iterative energy-based state refinement
Hugging Face Forums [Unofficial]
May 29, 2026
New Version
Zenodo
Attention Is All We Had — But Not What We Needed: Convergent State Machine...
We introduce the Convergent State Machine (CSM), a novel architecture for language generation that replaces attention entirely with energy-based iterative state refinement. Three models trained (66M, 150M, 331M), all with zero attention layers. CSM...
Discussion in the ATmosphere