External Publication
Visit Post

[Request] arXiv endorsement for new mech interp paper on LLM self-referential circuits

Hugging Face Forums [Unofficial] February 10, 2026
Source

Zenodo

When Models Examine Themselves: Vocabulary-Activation Correspondence in...

Large language models produce rich introspective language when prompted for self-examination, but whether this language reflects internal computation or sophisticated confabulation has remained unclear. We show that self-referential vocabulary tracks...

Update: a more concise version with formatting adjustments - welcoming any feedback and discussions.

Discussion in the ATmosphere

Loading comments...