External Publication

[Request] arXiv endorsement for new mech interp paper on LLM self-referential circuits

Hugging Face Forums [Unofficial] February 10, 2026

Zenodo

When Models Examine Themselves: Vocabulary-Activation Correspondence in...

Large language models produce rich introspective language when prompted for self-examination, but whether this language reflects internal computation or sophisticated confabulation has remained unclear. We show that self-referential vocabulary tracks...

Update: a more concise version with formatting adjustments - welcoming any feedback and discussions.

When Models Examine Themselves: Vocabulary-Activation Correspondence in...

Discussion in the ATmosphere