Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihqouyk4fhxss6z7rsqmluwbyr5xombi2zrxuj3gk77cgjecwzwue",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mita2ig2jf22"
  },
  "path": "/t/grokking-beyond-addition/175009#post_1",
  "publishedAt": "2026-04-06T07:42:00.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "https://zenodo.org/records/19256207"
  ],
  "textContent": "Hi everyone,\n\nI’m excited to share my research paper:\n**“Grokking Beyond Addition: Circuit-Level Analysis of Algebraic Learning in Transformers”**\n\nPaper: https://zenodo.org/records/19256207\n\nThis work explores grokking across multiple algebraic structures and shows a clear result:\n**At small model scale (d_model = 64), transformers reliably grok abelian tasks but fail to generalize on non-abelian groups** , even with 100% training accuracy.\n\nIt also highlights:\n\n  * Early **circuit formation before generalization**\n\n  * Evidence for **discrete-log structure in multiplication**\n\n  * Strong **embedding similarity across different tasks (CKA)**\n\n\n\n\n* * *\n\nI’m opening this project for collaboration and contributions:\n\n  * Scaling experiments (d_model = 128 / 256)\n\n  * Extending to more algebraic structures\n\n  * Interpretability improvements\n\n  * Reproduction and benchmarking\n\n\n\n\nIf you’re interested in mechanistic interpretability, grokking, or theory-driven ML, feel free to contribute, open issues, or reach out. Let’s build this together.",
  "title": "Grokking Beyond Addition"
}