External Publication
Visit Post

Integrating DeepSeek's Engram into OLMo-core — proof of concept complete, looking for compute advice

Hugging Face Forums [Unofficial] May 7, 2026
Source

Hi all,

I’ve been independently integrating DeepSeek’s Engram conditional memory module into AI2’s OLMo-core as an optional architectural component.

What I built:

  • Native integration via single config flag

  • All 4 architecture configurations verified (Attention + Dense FFN, Attention + MoE, GDN + Dense FFN, GDN + MoE)

  • First training run completed last night — loss going down, clean completion on 4×A40s

The research question: The original paper only benchmarks against MoE. My hypothesis is Engram’s gain is largest in dense FFN, where every token pays full compute with no sparsity escape valve. I’ve designed a 2×2 ablation to test this.

The ask: As an independent researcher without institutional affiliation, I’m looking for advice on compute access — grants, programs, or anything others have found useful for running training experiments at this scale.

GitHub

Full writeup and training run details here and here.

Any pointers appreciated.

Discussion in the ATmosphere

Loading comments...