{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidxuvsreqf3krfvvnp2bblfytyyfshtcxgkks4rv75oed6jh5l4du",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mkjynrjrpik2"
},
"path": "/t/built-a-pytest-style-behavioral-testing-package-for-agents-it-checks-tool-usage-tool-order-step-limits-and-regression-against-baselines-validated-on-live-openai-agents-sdk-tests/1379920#post_1",
"publishedAt": "2026-04-28T06:16:57.000Z",
"site": "https://community.openai.com",
"textContent": "I built AgentCheck(**pygent-test** on PyPI), a pytest-style behavioral testing package for AI agents. Instead of checking exact text, it checks behavior: tool usage, tool order, step limits, unsupported success claims, and baseline regressions. I’ve validated it on live OpenAI Agents SDK examples, including a single-tool and multi-tool workflow. I’d love feedback from people testing real agent systems.",
"title": "Built a pytest-style behavioral testing package for agents. It checks tool usage, tool order, step limits, and regression against baselines. Validated on live OpenAI Agents SDK tests"
}