Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidxuvsreqf3krfvvnp2bblfytyyfshtcxgkks4rv75oed6jh5l4du",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mkjynrjrpik2"
  },
  "path": "/t/built-a-pytest-style-behavioral-testing-package-for-agents-it-checks-tool-usage-tool-order-step-limits-and-regression-against-baselines-validated-on-live-openai-agents-sdk-tests/1379920#post_1",
  "publishedAt": "2026-04-28T06:16:57.000Z",
  "site": "https://community.openai.com",
  "textContent": "I built AgentCheck(**pygent-test** on PyPI), a pytest-style behavioral testing package for AI agents. Instead of checking exact text, it checks behavior: tool usage, tool order, step limits, unsupported success claims, and baseline regressions. I’ve validated it on live OpenAI Agents SDK examples, including a single-tool and multi-tool workflow. I’d love feedback from people testing real agent systems.",
  "title": "Built a pytest-style behavioral testing package for agents. It checks tool usage, tool order, step limits, and regression against baselines. Validated on live OpenAI Agents SDK tests"
}