{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreigwst7jqcuqtx4pnbglriee2bqginq4rdqqopbnu6yyofs7utcg5m",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmwm24qog6c2"
},
"path": "/t/attention-is-all-we-had-but-not-what-we-needed-language-generation-without-attention-via-iterative-energy-based-state-refinement/176285#post_6",
"publishedAt": "2026-05-28T16:08:41.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "Thank you Jean, this is exactly the kind of cross-validation\nthat strengthens both our findings.\n\nThe Δ-κ correspondence is striking — independent metrics\ncapturing the same internal dynamic stability from different\narchitectures. Your finding that the dynamic-semantic layers\nare partially decoupled explains precisely why our MMLU\nscores remain flat while perplexity improves monotonically\nwith iteration depth.\n\nI’m very interested in measuring κ on CSM’s iteration\ndynamics. A cross-architecture comparison would be\nvaluable for both our work.\n\nI’ll read your 3-paper series carefully. Could you share\nthe DOIs?\n\nCurrently training a 300M CSM to test whether the useful\niteration range extends further with scale. Results within\n24 hours.\n\nArunesh Dwivedi\nVKD Industries",
"title": "Attention Is All We Had — But Not What We Needed: Language generation without attention via iterative energy-based state refinement"
}