External Publication
Visit Post

AIOS — First Ground Truth Baseline (CPU DRAM Measurement)

Hugging Face Forums [Unofficial] March 29, 2026
Source

AIOS — First Ground Truth Baseline (CPU DRAM Measurement)

Following up on my earlier post introducing AIOS (CPU-native LLM inference architecture), we now have the first validated baseline measurement using hardware memory controller counters.

Setup

  • Model: Falcon 7B (GGUF Q4_K_M)

  • CPU: Intel Core Ultra 7 265K (20 cores)

  • OS: Arch Linux (kernel 6.19.10-zen1-1-zen)

  • Method: perf uncore IMC counters (uncore_imc_free_running_0/data_read/)

Results (5 runs × 200 tokens)

  • MB/token: 2340 ± 4 MB

  • Coefficient of Variation: 0.17%

  • Tokens/sec: 11.43 ± 0.05

Key Takeaways

  • The measurement is highly stable (CV < 1%), confirming that DRAM reads can be treated as a reliable physical metric.

  • ~456–459 GB DRAM read for 200 tokens highlights the memory bandwidth wall in CPU inference.

  • This establishes a ground truth baseline for AIOS evaluation.

Why this matters

Most inference discussions optimize for tokens/sec.

AIOS instead treats MB/token as the primary constraint, because on CPUs, memory movement—not compute—is the bottleneck.

What’s next

  • Issue #1: Falcon 7B “relufication” (R1 compliance)

  • Headroom analysis (validation/headroom.py)

  • Additional baselines across models / quantizations

Call for contributors

If you can run perf on bare-metal Linux, contributions are very valuable:

  • Run baseline measurements on your hardware

  • Validate different models / quantizations

  • Help quantify headroom vs AIOS projections

Repo: GitHub - acasavaraju/AIOS: CPU-native LLM inference architecture. Memory residency controller that reduces DRAM data movement per generated token through weight aliasing, sparsity maps, KV cache tiering, and activation chunking. Includes Model Contract spec for architecture co-design. Framework + validation tooling — runtime contributions welcome. Paper: SSRN 6467298 · GitHub

Acknowledgment

Huge thanks to @reimorster for running the first full validation and helping establish this baseline.

This is the first step toward making memory movement a first-class metric for LLM inference.

Discussion in the ATmosphere

Loading comments...