External Publication
Visit Post

ORCA: A Cognitive Runtime Layer for Agent Systems (paper + open source)

Hugging Face Forums [Unofficial] April 16, 2026
Source

Since the last round of discussion, we ran a controlled experiment comparing single-prompt execution against ORCA’s multi-step skill orchestration on two tasks (structured decision-making and multi-step text processing). 10 inputs per task, same model (gpt-4o-mini), fixed seed.

The numbers are honest:

Dimension Prompt-based ORCA Structured
Latency Lower (1 LLM call) Higher (N sequential calls)
Traceability None Full step-level trace
Reusability None Full capability reuse
Maintainability Low (monolithic) High (declarative YAML)
Variability Low Low-moderate

ORCA is not faster for simple one-off tasks. That’s not the point.

The point is what happens when you need to audit what your agent did, swap a backend without rewriting the workflow, reuse a step across 15 different skills, or resume a failed run from a checkpoint.

Prompt-based execution gives you none of that. Not because the prompt was bad — because the architecture doesn’t support it.

Full benchmark code and results are in the repo: run_benchmark.py

Discussion in the ATmosphere

Loading comments...