External Publication
Visit Post

Attention Is All We Had — But Not What We Needed: Language generation without attention via iterative energy-based state refinement

Hugging Face Forums [Unofficial] May 28, 2026
Source

Arunesh — glad the Δ-κ correspondence resonated. It’s rare to find independent work converging on the same dynamic stability principle from opposite directions (architecture design vs. empirical measurement).

The DOIs for the 3-paper series:

  1. Four Dynamical Regimes in LLMs: An Empirical Phase Map 10.5281/zenodo.20348878

  2. Methodological Audit of Trajectory Instability 10.5281/zenodo.20361289

  3. Dynamic-Layer Controllability 10.5281/zenodo.20400171

All three are open access on Zenodo. The dataset is on HuggingFace as jeanbatuli/LLM-Interne-Dynamic.

On your 300M CSM — I’d be very interested in whether the ∆->0 convergence point extends as predicted. My perturbation threshold data suggests the relationship isn’t purely linear with parameter count (Qwen05 at 500M shows ~7.5× higher threshold than GPT-2 at 124M, not ~4× as pure depth scaling would predict). Architecture matters alongside scale.

If you can expose iteration-level hidden states during CSM inference, I can run the same κ/readiness pipeline on CSM that I use on Transformers. Same operator, different architecture. That’s the cleanest cross-validation we could ask for.

-– Jean-Denis

Discussion in the ATmosphere

Loading comments...