{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreia6fklpqdguqriovk7p7qhfv26hqxdzdwiioeuwywlwsapgxut2ci",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mivnjpwasmp2"
},
"path": "/t/project-qitos-a-research-first-framework-for-building-and-evaluating-llm-agents/175037#post_1",
"publishedAt": "2026-04-07T05:47:14.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"GitHub - Qitor/qitos: Let's Qitos! A torch-like agent-native framework for researchers. · GitHub",
"Home - QitOS",
"@dataclass"
],
"textContent": "# [Project] QitOS: A research-first framework for building and evaluating LLM agents\n\nHey everyone,\n\nI wanted to share QitOS, a new framework I’ve been working on that’s built specifically for LLM agent researchers.\n\nAfter working on several agent projects, I found that most existing frameworks didn’t really fit the research workflow:\n\n * It was too hard to quickly iterate on new agent architectures without rewriting the entire execution stack\n * Strategy (how the agent thinks) and execution (tool calling, tracing, evaluation) were always tangled together\n * Getting set up to evaluate on standard benchmarks took way longer than the actual research\n * Debugging agent trajectories was a mess without proper tooling\n\n\n\nQitOS was built to solve all these problems:\n\n## Key Features\n\n * **Clean architecture** : Separation between `AgentModule` (your strategy/innovation) and `Engine` (orchestration, tool execution, tracing). You focus on the research, the framework handles the rest.\n * **Research-friendly** : Supports all common agent patterns out of the box: ReAct, Plan-Act, Tree-of-Thought, Reflexion, and makes it extremely easy to implement custom scaffolds.\n * **Benchmark-native** : Built-in adapters for GAIA, Tau-Bench, and CyBench so you can get your evaluation up and running in minutes.\n * **Great observability** : The `qita` CLI lets you browse, inspect, replay, and export full agent trajectories — no more digging through raw log files.\n * **Ecosystem compatible** : Works naturally with any OpenAI-compatible model API, so you can use whatever models you prefer.\n\n\n\n## Minimal Example\n\nHere’s what a minimal SWE agent looks like in QitOS:\n\n\n from dataclasses import dataclass, field\n from qitos import AgentModule, Engine, Task, ToolRegistry\n from qitos.kit.parser import ReActTextParser\n from qitos.kit.tool import EditorToolSet, RunCommand\n\n @dataclass\n class SWEState(StateSchema):\n scratchpad: list[str] = field(default_factory=list)\n\n class MySWEAgent(AgentModule[SWEState, ...]):\n def __init__(self, llm, workspace_root):\n reg = ToolRegistry()\n reg.include(EditorToolSet(workspace_root))\n reg.register(RunCommand(workspace_root))\n super().__init__(\n tool_registry=reg,\n llm=llm,\n model_parser=ReActTextParser()\n )\n\n # Implement your strategy logic here...\n\n # Run it\n agent = MySWEAgent(llm=my_llm, workspace_root=\"./playground\")\n result = Engine(agent=agent).run(my_task)\n print(result.state.final_result)\n\n\n## Get Started\n\n * GitHub: GitHub - Qitor/qitos: Let's Qitos! A torch-like agent-native framework for researchers. · GitHub\n * Documentation: Home - QitOS\n * Install: `pip install qitos`\n\n\n\nI’m really interested to hear what the community thinks — what do you find most frustrating about building LLM agents for research? Are there any features you’d like to see added to QitOS?\n\nAll feedback and contributions are very welcome!",
"title": "[Project] QitOS: A research-first framework for building and evaluating LLM agents"
}