External Publication

Integrating DeepSeek's Engram into OLMo-core — proof of concept complete, looking for compute advice

Hugging Face Forums [Unofficial] May 7, 2026

Hi all,

I’ve been independently integrating DeepSeek’s Engram conditional memory module into AI2’s OLMo-core as an optional architectural component.

What I built:

Native integration via single config flag
All 4 architecture configurations verified (Attention + Dense FFN, Attention + MoE, GDN + Dense FFN, GDN + MoE)
First training run completed last night — loss going down, clean completion on 4×A40s

The research question: The original paper only benchmarks against MoE. My hypothesis is Engram’s gain is largest in dense FFN, where every token pays full compute with no sparsity escape valve. I’ve designed a 2×2 ablation to test this.

The ask: As an independent researcher without institutional affiliation, I’m looking for advice on compute access — grants, programs, or anything others have found useful for running training experiments at this scale.

GitHub

Full writeup and training run details here and here.

Any pointers appreciated.

Discussion in the ATmosphere