External Publication
Visit Post

Controlled study: AI operational experience improves performance by 1.07 SD (open data + code)

Hugging Face Forums [Unofficial] April 13, 2026
Source
Hi everyone, We just published a controlled experiment measuring the effect of accumulated operational experience on AI assistant performance. Quick summary: * An AI assistant (ARIA) that has been operating for months, accumulating experience fragments and operational memory, was compared against the same base model (Claude Opus 4.6) without experience * 50 real-world questions, 1,200 blind judgments from 3 independent judges * Result: Cohen’s d = 1.07, Friedman p < 10^-25 * The effect is domain-specific — strong on operational tasks, near zero on algorithmic controls This builds on work by ExpeL, MemGPT, Generative Agents, and Reflexion — but measures experience effects in a production system rather than a sandbox. Everything is open: * Paper: https://zenodo.org/records/19533311 * Data + code: GitHub - patechlabs/aria-experience-study: Data and code for: Operational Experience as a Performance Multiplier in AI Assistants (Nuraliev & Rychkova, 2026) · GitHub Would love feedback from this community. Also seeking an arXiv cs.AI endorser if anyone is qualified — endorsement code MJLELZ. Thanks! Ravshan Nuraliev, PaTech Labs

Discussion in the ATmosphere

Loading comments...