External Publication

Grokking Beyond Addition

Hugging Face Forums [Unofficial] April 6, 2026

Hi everyone,

I’m excited to share my research paper: “Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers”

Paper: https://zenodo.org/records/19256207

This work explores grokking across multiple algebraic structures and shows a clear result: At small model scale (d_model = 64), transformers reliably grok abelian tasks but fail to generalize on non-abelian groups , even with 100% training accuracy.

It also highlights:

Early circuit formation before generalization
Evidence for discrete-log structure in multiplication
Strong embedding similarity across different tasks (CKA)

I’m opening this project for collaboration and contributions:

Scaling experiments (d_model = 128 / 256)
Extending to more algebraic structures
Interpretability improvements
Reproduction and benchmarking

If you’re interested in mechanistic interpretability, grokking, or theory-driven ML, feel free to contribute, open issues, or reach out. Let’s build this together.

Discussion in the ATmosphere