Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibdfvrgasb6weelgchd6ngh674imliq3qji7uolrvpsswsxd5ztxa",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mlcc4hbbbf52"
  },
  "path": "/t/integrating-deepseeks-engram-into-olmo-core-proof-of-concept-complete-looking-for-compute-advice/175836#post_1",
  "publishedAt": "2026-05-07T21:43:22.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "GitHub",
    "here"
  ],
  "textContent": "Hi all,\n\nI’ve been independently integrating DeepSeek’s Engram conditional memory module into AI2’s OLMo-core as an optional architectural component.\n\n**What I built:**\n\n  * Native integration via single config flag\n\n  * All 4 architecture configurations verified (Attention + Dense FFN, Attention + MoE, GDN + Dense FFN, GDN + MoE)\n\n  * First training run completed last night — loss going down, clean completion on 4×A40s\n\n\n\n\n**The research question:** The original paper only benchmarks against MoE. My hypothesis is Engram’s gain is largest in dense FFN, where every token pays full compute with no sparsity escape valve. I’ve designed a 2×2 ablation to test this.\n\n**The ask:** As an independent researcher without institutional affiliation, I’m looking for advice on compute access — grants, programs, or anything others have found useful for running training experiments at this scale.\n\nGitHub\n\nFull writeup and training run details here and here.\n\nAny pointers appreciated.",
  "title": "Integrating DeepSeek's Engram into OLMo-core — proof of concept complete, looking for compute advice"
}