AIOS — First Ground Truth Baseline (CPU DRAM Measurement)
AIOS — First Ground Truth Baseline (CPU DRAM Measurement)
Following up on my earlier post introducing AIOS (CPU-native LLM inference architecture), we now have the first validated baseline measurement using hardware memory controller counters.
Setup
Model: Falcon 7B (GGUF Q4_K_M)
CPU: Intel Core Ultra 7 265K (20 cores)
OS: Arch Linux (kernel 6.19.10-zen1-1-zen)
Method: perf uncore IMC counters (uncore_imc_free_running_0/data_read/)
Results (5 runs × 200 tokens)
MB/token: 2340 ± 4 MB
Coefficient of Variation: 0.17%
Tokens/sec: 11.43 ± 0.05
Key Takeaways
The measurement is highly stable (CV < 1%), confirming that DRAM reads can be treated as a reliable physical metric.
~456–459 GB DRAM read for 200 tokens highlights the memory bandwidth wall in CPU inference.
This establishes a ground truth baseline for AIOS evaluation.
Why this matters
Most inference discussions optimize for tokens/sec.
AIOS instead treats MB/token as the primary constraint, because on CPUs, memory movement—not compute—is the bottleneck.
What’s next
Issue #1: Falcon 7B “relufication” (R1 compliance)
Headroom analysis (validation/headroom.py)
Additional baselines across models / quantizations
Call for contributors
If you can run perf on bare-metal Linux, contributions are very valuable:
Run baseline measurements on your hardware
Validate different models / quantizations
Help quantify headroom vs AIOS projections
Repo: GitHub - acasavaraju/AIOS: CPU-native LLM inference architecture. Memory residency controller that reduces DRAM data movement per generated token through weight aliasing, sparsity maps, KV cache tiering, and activation chunking. Includes Model Contract spec for architecture co-design. Framework + validation tooling — runtime contributions welcome. Paper: SSRN 6467298 · GitHub
Acknowledgment
Huge thanks to @reimorster for running the first full validation and helping establish this baseline.
This is the first step toward making memory movement a first-class metric for LLM inference.
Discussion in the ATmosphere