External Publication

ORCA: A Cognitive Runtime Layer for Agent Systems (paper + open source)

Hugging Face Forums [Unofficial] April 16, 2026

Since the last round of discussion, we ran a controlled experiment comparing single-prompt execution against ORCA’s multi-step skill orchestration on two tasks (structured decision-making and multi-step text processing). 10 inputs per task, same model (gpt-4o-mini), fixed seed.

The numbers are honest:

Dimension	Prompt-based	ORCA Structured
Latency	Lower (1 LLM call)	Higher (N sequential calls)
Traceability	None	Full step-level trace
Reusability	None	Full capability reuse
Maintainability	Low (monolithic)	High (declarative YAML)
Variability	Low	Low-moderate

ORCA is not faster for simple one-off tasks. That’s not the point.

The point is what happens when you need to audit what your agent did, swap a backend without rewriting the workflow, reuse a step across 15 different skills, or resume a failed run from a checkpoint.

Prompt-based execution gives you none of that. Not because the prompt was bad — because the architecture doesn’t support it.

Full benchmark code and results are in the repo: run_benchmark.py

Discussion in the ATmosphere