Controlled study: AI operational experience improves performance by 1.07 SD (open data + code)
Hugging Face Forums [Unofficial]
April 13, 2026
Hi everyone,
We just published a controlled experiment measuring the effect of accumulated operational experience on AI assistant performance.
Quick summary:
* An AI assistant (ARIA) that has been operating for months, accumulating experience fragments and operational memory, was compared against the same base model (Claude Opus 4.6) without experience
* 50 real-world questions, 1,200 blind judgments from 3 independent judges
* Result: Cohen’s d = 1.07, Friedman p < 10^-25
* The effect is domain-specific — strong on operational tasks, near zero on algorithmic controls
This builds on work by ExpeL, MemGPT, Generative Agents, and Reflexion — but measures experience effects in a production system rather than a sandbox.
Everything is open:
* Paper: https://zenodo.org/records/19533311
* Data + code: GitHub - patechlabs/aria-experience-study: Data and code for: Operational Experience as a Performance Multiplier in AI Assistants (Nuraliev & Rychkova, 2026) · GitHub
Would love feedback from this community. Also seeking an arXiv cs.AI endorser if anyone is qualified — endorsement code MJLELZ.
Thanks!
Ravshan Nuraliev, PaTech Labs
Discussion in the ATmosphere