{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreibdfvrgasb6weelgchd6ngh674imliq3qji7uolrvpsswsxd5ztxa",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mlcc4hbbbf52"
},
"path": "/t/integrating-deepseeks-engram-into-olmo-core-proof-of-concept-complete-looking-for-compute-advice/175836#post_1",
"publishedAt": "2026-05-07T21:43:22.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"GitHub",
"here"
],
"textContent": "Hi all,\n\nI’ve been independently integrating DeepSeek’s Engram conditional memory module into AI2’s OLMo-core as an optional architectural component.\n\n**What I built:**\n\n * Native integration via single config flag\n\n * All 4 architecture configurations verified (Attention + Dense FFN, Attention + MoE, GDN + Dense FFN, GDN + MoE)\n\n * First training run completed last night — loss going down, clean completion on 4×A40s\n\n\n\n\n**The research question:** The original paper only benchmarks against MoE. My hypothesis is Engram’s gain is largest in dense FFN, where every token pays full compute with no sparsity escape valve. I’ve designed a 2×2 ablation to test this.\n\n**The ask:** As an independent researcher without institutional affiliation, I’m looking for advice on compute access — grants, programs, or anything others have found useful for running training experiments at this scale.\n\nGitHub\n\nFull writeup and training run details here and here.\n\nAny pointers appreciated.",
"title": "Integrating DeepSeek's Engram into OLMo-core — proof of concept complete, looking for compute advice"
}