External Publication
Visit Post

[Project] QitOS: A research-first framework for building and evaluating LLM agents

Hugging Face Forums [Unofficial] April 7, 2026
Source

[Project] QitOS: A research-first framework for building and evaluating LLM agents

Hey everyone,

I wanted to share QitOS, a new framework I’ve been working on that’s built specifically for LLM agent researchers.

After working on several agent projects, I found that most existing frameworks didn’t really fit the research workflow:

  • It was too hard to quickly iterate on new agent architectures without rewriting the entire execution stack
  • Strategy (how the agent thinks) and execution (tool calling, tracing, evaluation) were always tangled together
  • Getting set up to evaluate on standard benchmarks took way longer than the actual research
  • Debugging agent trajectories was a mess without proper tooling

QitOS was built to solve all these problems:

Key Features

  • Clean architecture : Separation between AgentModule (your strategy/innovation) and Engine (orchestration, tool execution, tracing). You focus on the research, the framework handles the rest.
  • Research-friendly : Supports all common agent patterns out of the box: ReAct, Plan-Act, Tree-of-Thought, Reflexion, and makes it extremely easy to implement custom scaffolds.
  • Benchmark-native : Built-in adapters for GAIA, Tau-Bench, and CyBench so you can get your evaluation up and running in minutes.
  • Great observability : The qita CLI lets you browse, inspect, replay, and export full agent trajectories — no more digging through raw log files.
  • Ecosystem compatible : Works naturally with any OpenAI-compatible model API, so you can use whatever models you prefer.

Minimal Example

Here’s what a minimal SWE agent looks like in QitOS:

from dataclasses import dataclass, field
from qitos import AgentModule, Engine, Task, ToolRegistry
from qitos.kit.parser import ReActTextParser
from qitos.kit.tool import EditorToolSet, RunCommand

@dataclass
class SWEState(StateSchema):
    scratchpad: list[str] = field(default_factory=list)

class MySWEAgent(AgentModule[SWEState, ...]):
    def __init__(self, llm, workspace_root):
        reg = ToolRegistry()
        reg.include(EditorToolSet(workspace_root))
        reg.register(RunCommand(workspace_root))
        super().__init__(
            tool_registry=reg,
            llm=llm,
            model_parser=ReActTextParser()
        )

    # Implement your strategy logic here...

# Run it
agent = MySWEAgent(llm=my_llm, workspace_root="./playground")
result = Engine(agent=agent).run(my_task)
print(result.state.final_result)

Get Started

  • GitHub: GitHub - Qitor/qitos: Let's Qitos! A torch-like agent-native framework for researchers. · GitHub
  • Documentation: Home - QitOS
  • Install: pip install qitos

I’m really interested to hear what the community thinks — what do you find most frustrating about building LLM agents for research? Are there any features you’d like to see added to QitOS?

All feedback and contributions are very welcome!

Discussion in the ATmosphere

Loading comments...