{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigwst7jqcuqtx4pnbglriee2bqginq4rdqqopbnu6yyofs7utcg5m",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmwm24qog6c2"
  },
  "path": "/t/attention-is-all-we-had-but-not-what-we-needed-language-generation-without-attention-via-iterative-energy-based-state-refinement/176285#post_6",
  "publishedAt": "2026-05-28T16:08:41.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "Thank you Jean, this is exactly the kind of cross-validation\nthat strengthens both our findings.\n\nThe Δ-κ correspondence is striking — independent metrics\ncapturing the same internal dynamic stability from different\narchitectures. Your finding that the dynamic-semantic layers\nare partially decoupled explains precisely why our MMLU\nscores remain flat while perplexity improves monotonically\nwith iteration depth.\n\nI’m very interested in measuring κ on CSM’s iteration\ndynamics. A cross-architecture comparison would be\nvaluable for both our work.\n\nI’ll read your 3-paper series carefully. Could you share\nthe DOIs?\n\nCurrently training a 300M CSM to test whether the useful\niteration range extends further with scale. Results within\n24 hours.\n\nArunesh Dwivedi\nVKD Industries",
  "title": "Attention Is All We Had — But Not What We Needed: Language generation without attention via iterative energy-based state refinement"
}