External Publication
Visit Post

Attention Is All We Had — But Not What We Needed: Language generation without attention via iterative energy-based state refinement

Hugging Face Forums [Unofficial] May 28, 2026
Source
Thank you Jean, this is exactly the kind of cross-validation that strengthens both our findings. The Δ-κ correspondence is striking — independent metrics capturing the same internal dynamic stability from different architectures. Your finding that the dynamic-semantic layers are partially decoupled explains precisely why our MMLU scores remain flat while perplexity improves monotonically with iteration depth. I’m very interested in measuring κ on CSM’s iteration dynamics. A cross-architecture comparison would be valuable for both our work. I’ll read your 3-paper series carefully. Could you share the DOIs? Currently training a 300M CSM to test whether the useful iteration range extends further with scale. Results within 24 hours. Arunesh Dwivedi VKD Industries

Discussion in the ATmosphere

Loading comments...