Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreia6fklpqdguqriovk7p7qhfv26hqxdzdwiioeuwywlwsapgxut2ci",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mivnjpwasmp2"
  },
  "path": "/t/project-qitos-a-research-first-framework-for-building-and-evaluating-llm-agents/175037#post_1",
  "publishedAt": "2026-04-07T05:47:14.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "GitHub - Qitor/qitos: Let's Qitos! A torch-like agent-native framework for researchers. · GitHub",
    "Home - QitOS",
    "@dataclass"
  ],
  "textContent": "# [Project] QitOS: A research-first framework for building and evaluating LLM agents\n\nHey everyone,\n\nI wanted to share QitOS, a new framework I’ve been working on that’s built specifically for LLM agent researchers.\n\nAfter working on several agent projects, I found that most existing frameworks didn’t really fit the research workflow:\n\n  * It was too hard to quickly iterate on new agent architectures without rewriting the entire execution stack\n  * Strategy (how the agent thinks) and execution (tool calling, tracing, evaluation) were always tangled together\n  * Getting set up to evaluate on standard benchmarks took way longer than the actual research\n  * Debugging agent trajectories was a mess without proper tooling\n\n\n\nQitOS was built to solve all these problems:\n\n## Key Features\n\n  * **Clean architecture** : Separation between `AgentModule` (your strategy/innovation) and `Engine` (orchestration, tool execution, tracing). You focus on the research, the framework handles the rest.\n  * **Research-friendly** : Supports all common agent patterns out of the box: ReAct, Plan-Act, Tree-of-Thought, Reflexion, and makes it extremely easy to implement custom scaffolds.\n  * **Benchmark-native** : Built-in adapters for GAIA, Tau-Bench, and CyBench so you can get your evaluation up and running in minutes.\n  * **Great observability** : The `qita` CLI lets you browse, inspect, replay, and export full agent trajectories — no more digging through raw log files.\n  * **Ecosystem compatible** : Works naturally with any OpenAI-compatible model API, so you can use whatever models you prefer.\n\n\n\n## Minimal Example\n\nHere’s what a minimal SWE agent looks like in QitOS:\n\n\n    from dataclasses import dataclass, field\n    from qitos import AgentModule, Engine, Task, ToolRegistry\n    from qitos.kit.parser import ReActTextParser\n    from qitos.kit.tool import EditorToolSet, RunCommand\n\n    @dataclass\n    class SWEState(StateSchema):\n        scratchpad: list[str] = field(default_factory=list)\n\n    class MySWEAgent(AgentModule[SWEState, ...]):\n        def __init__(self, llm, workspace_root):\n            reg = ToolRegistry()\n            reg.include(EditorToolSet(workspace_root))\n            reg.register(RunCommand(workspace_root))\n            super().__init__(\n                tool_registry=reg,\n                llm=llm,\n                model_parser=ReActTextParser()\n            )\n\n        # Implement your strategy logic here...\n\n    # Run it\n    agent = MySWEAgent(llm=my_llm, workspace_root=\"./playground\")\n    result = Engine(agent=agent).run(my_task)\n    print(result.state.final_result)\n\n\n## Get Started\n\n  * GitHub: GitHub - Qitor/qitos: Let's Qitos! A torch-like agent-native framework for researchers. · GitHub\n  * Documentation: Home - QitOS\n  * Install: `pip install qitos`\n\n\n\nI’m really interested to hear what the community thinks — what do you find most frustrating about building LLM agents for research? Are there any features you’d like to see added to QitOS?\n\nAll feedback and contributions are very welcome!",
  "title": "[Project] QitOS: A research-first framework for building and evaluating LLM agents"
}