{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreihqouyk4fhxss6z7rsqmluwbyr5xombi2zrxuj3gk77cgjecwzwue",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3misslpfhxnz2"
},
"path": "/t/grokking-beyond-addition/175009#post_1",
"publishedAt": "2026-04-06T07:42:00.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"https://zenodo.org/records/19256207"
],
"textContent": "Hi everyone,\n\nI’m excited to share my research paper:\n**“Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers”**\n\nPaper: https://zenodo.org/records/19256207\n\nThis work explores grokking across multiple algebraic structures and shows a clear result:\n**At small model scale (d_model = 64), transformers reliably grok abelian tasks but fail to generalize on non-abelian groups** , even with 100% training accuracy.\n\nIt also highlights:\n\n * Early **circuit formation before generalization**\n\n * Evidence for **discrete-log structure in multiplication**\n\n * Strong **embedding similarity across different tasks (CKA)**\n\n\n\n\n* * *\n\nI’m opening this project for collaboration and contributions:\n\n * Scaling experiments (d_model = 128 / 256)\n\n * Extending to more algebraic structures\n\n * Interpretability improvements\n\n * Reproduction and benchmarking\n\n\n\n\nIf you’re interested in mechanistic interpretability, grokking, or theory-driven ML, feel free to contribute, open issues, or reach out. Let’s build this together.",
"title": "Grokking Beyond Addition"
}