{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreig2yqwpavixcpmse7y3uvwzdlsxpq7juo2e64dwdcvk44xexglxqy",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjmpz3phkqr2"
},
"path": "/t/orca-a-cognitive-runtime-layer-for-agent-systems-paper-open-source/175055#post_10",
"publishedAt": "2026-04-16T14:31:55.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "Since the last round of discussion, we ran a controlled experiment comparing single-prompt execution against ORCA’s multi-step skill orchestration on two tasks (structured decision-making and multi-step text processing). 10 inputs per task, same model (gpt-4o-mini), fixed seed.\n\nThe numbers are honest:\n\nDimension | Prompt-based | ORCA Structured\n---|---|---\nLatency | Lower (1 LLM call) | Higher (N sequential calls)\nTraceability | None | Full step-level trace\nReusability | None | Full capability reuse\nMaintainability | Low (monolithic) | High (declarative YAML)\nVariability | Low | Low-moderate\n\nORCA is **not faster** for simple one-off tasks. That’s not the point.\n\nThe point is what happens when you need to audit what your agent did, swap a backend without rewriting the workflow, reuse a step across 15 different skills, or resume a failed run from a checkpoint.\n\nPrompt-based execution gives you none of that. Not because the prompt was bad — because the architecture doesn’t support it.\n\nFull benchmark code and results are in the repo: run_benchmark.py",
"title": "ORCA: A Cognitive Runtime Layer for Agent Systems (paper + open source)"
}