{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreifhdvaurknuyp7nzneaqwz5f2spwphbak53uv5ztfekz2ymnovhka",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mohf5vmsbgp2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreih6anp76lxibfsdthvkyd4g36uz2y7vvj63w3hplhpxu436puripm"
},
"mimeType": "image/webp",
"size": 67476
},
"path": "/marvin_ma_597e184518c2221/ai-made-development-faster-testing-needs-to-stop-living-in-spreadsheets-4ap0",
"publishedAt": "2026-06-17T03:29:24.000Z",
"site": "https://dev.to",
"tags": [
"ai",
"testing",
"devtools",
"productivity",
"https://github.com/lijma/testboat",
"https://lijma.github.io/testboat/"
],
"textContent": "AI agents are making software development faster.\n\nThat is great.\n\nBut there is a problem I do not think we are talking about enough:\n\n**testing is not speeding up in the same way.**\n\nIn many teams, testing is still held together by spreadsheets, meeting notes, screenshots, chat messages, and the memory of a few experienced QA engineers.\n\nThat worked when delivery was slower.\n\nIt becomes fragile when one developer can use multiple agents to change code across several modules in a single afternoon.\n\nThe bottleneck is no longer \"can we write more test cases?\"\n\nThe bottleneck is:\n\n> Can the team prove what was tested, why it was tested, what failed, what was fixed, and whether the release is safe?\n\nThat is the problem I built `testboat` for.\n\n## The Most Dangerous Sentence Before A Release\n\nThe sentence I worry about most is not:\n\n> We did not test this.\n\nAt least that is honest.\n\nThe dangerous sentence is:\n\n> I think we tested this.\n\nThat sentence usually means the team has test artifacts, but they are disconnected:\n\n * requirements live in a doc\n * test cases live in a spreadsheet\n * automation scripts live somewhere in the repo\n * execution results live in CI logs or chat\n * bugs live in an issue tracker\n * release reports are written manually before sign-off\n\n\n\nEach piece may be useful on its own.\n\nBut when a Tech Lead asks, \"Which requirements are not covered?\" or a founder asks, \"Can we release today?\", the team has to reconstruct the answer manually.\n\nThat is not a testing process.\n\nThat is institutional memory under pressure.\n\n## AI Makes This Gap Worse\n\nAI agents are very good at increasing throughput.\n\nThey can:\n\n * implement a feature faster\n * refactor code faster\n * generate UI faster\n * write automation faster\n * fix bugs faster\n\n\n\nBut faster change creates more testing uncertainty.\n\nIf an agent changes the authentication module, what should be rerun?\n\nIf a test fails, is it a product bug, a flaky automation script, or an environment issue?\n\nIf a developer says \"fixed\", has the failed test actually been rerun?\n\nIf a release report says \"main flows passed\", where is the evidence?\n\nWithout a structured system, QA becomes the human buffer. Tech Leads become risk translators. Founders buy uncertainty with every release.\n\nThat is not sustainable.\n\n## Testing Needs To Become An Engineering System\n\n`testboat` treats test artifacts like code.\n\nIt creates a `.testboat/` directory in your project:\n\n\n\n .testboat/\n .active\n draft/\n strategy.yaml\n tags.yaml\n cases/\n TC-001.yaml\n bugs/\n BUG-001.yaml\n executions/\n plans/\n results/\n execution-matrix.yaml\n automate/\n reports/\n\n\nThe important part is not \"YAML is nice.\"\n\nThe important part is **connection**.\n\nA requirement connects to a test case through `req_id`.\n\nA test case connects to an execution plan.\n\nAn execution plan connects to an automation script.\n\nA result connects back to the test case.\n\nA bug can connect to both the test case and the failing result.\n\nThe latest execution state is summarized in an execution matrix.\n\nReports are generated from the same artifacts, not written from memory.\n\nThat changes the conversation.\n\nInstead of asking:\n\n> Did we test login?\n\nYou can ask:\n\n> Show me every auth test case, its latest result, open bugs, and whether the release exit criteria passed.\n\n## What QA Gets\n\nQA should not have to be the team's memory database.\n\nWith `testboat`, a test case is a structured file:\n\n\n\n id: TC-001\n title: Login with wrong password returns 401\n status: ready\n priority: P1\n automation: to-automate\n tags:\n sprint: v1.0\n type: functional\n module: auth\n req_id: STORY-001\n steps:\n - action: Enter wrong password\n expected: API returns 401\n expected_result: User sees a clear error message\n\n\nIt is diffable.\n\nIt is reviewable.\n\nIt has a state:\n\n\n\n draft -> ready -> pass / fail / blocked / skipped\n\n\nThat means QA can maintain testing facts instead of constantly answering questions from memory.\n\n## What Tech Leads Get\n\nTech Leads need quality gates, not just good intentions.\n\n`testboat validate` runs pre-report checks:\n\n 1. format validation\n 2. requirements coverage\n 3. execution completeness\n 4. exit criteria compliance\n\n\n\nThat last part matters.\n\nYour `strategy.yaml` can define severity rules and exit criteria. For example, P0 and P1 bugs must be zero before release.\n\nSo the report is not just a nice HTML page.\n\nIt is generated after the system checks whether the release evidence is healthy enough.\n\nThis is the kind of thing that can eventually belong in CI.\n\n## What Founders And Managers Get\n\nFounders do not need to read every test case.\n\nBut they do need release confidence.\n\n\"Main flows passed\" is not enough.\n\nThe useful questions are:\n\n * how many test cases exist?\n * how many passed?\n * how many were not executed?\n * which requirements are uncovered?\n * how many open P0/P1 bugs remain?\n * did the release satisfy exit criteria?\n * can we trace failures back to bugs and fixes?\n\n\n\n`testboat` generates strategy, sprint, and closure reports from the actual test artifacts.\n\nThat gives leadership evidence instead of vibes.\n\n## Where AI Agents Fit\n\nThe goal is not to replace QA.\n\nThe goal is to give AI agents a testing workflow they can follow.\n\n`testboat enable` creates agent-specific instructions for tools like Claude, Copilot, Cursor, Kiro, and others.\n\nAn agent can then follow a repeatable SOP:\n\n 1. check the active test version\n 2. read the strategy\n 3. inspect registered tags\n 4. create or update test cases\n 5. validate test cases\n 6. create execution plans\n 7. run automation or guide manual testing\n 8. record results\n 9. file bugs on failures\n 10. rerun affected tests after fixes\n 11. validate before reporting\n\n\n\nThat is the difference between \"AI wrote some tests\" and \"AI participated in the testing lifecycle.\"\n\n## A Small Example\n\nIf the auth module changed, you should not ask:\n\n> Can someone test login?\n\nYou should be able to do this:\n\n\n\n testboat case list --module auth\n testboat matrix show\n\n\nThen rerun the affected tests and record results:\n\n\n\n testboat result record TC-001 pass --type automated --by \"AI\"\n\n\nIf a failure appears:\n\n\n\n testboat bug add \\\n --title \"Wrong password returns 500 instead of 401\" \\\n --tc TC-001 \\\n --severity major \\\n --priority P1\n\n\nAnd after the fix, the bug should not jump straight to \"closed.\"\n\nIt should move through retest:\n\n\n\n fixed -> pending-retest -> verified -> closed\n\n\nThat is the loop teams need when development is moving faster.\n\n## Why I Think This Matters Now\n\nAI is making code cheaper to produce.\n\nThat does not automatically make releases safer.\n\nIf anything, it makes weak testing systems more visible.\n\nThe next layer of AI engineering is not just faster code generation.\n\nIt is turning the surrounding engineering practices into systems that agents can participate in.\n\nTesting is one of those practices.\n\nThat is why I built `testboat`.\n\nNot to generate more test cases.\n\nTo make testing traceable, reviewable, versioned, validated, and reportable.\n\n## Try It\n\n\n pip install testboat\n testboat init\n testboat enable cursor\n testboat strategy create\n\n\nProject:\n\nhttps://github.com/lijma/testboat\n\nDocs:\n\nhttps://lijma.github.io/testboat/\n\n## Question\n\nHow does your team know a release is actually ready?\n\nIs that answer stored in a system, or mostly in people's heads?",
"title": "AI Made Development Faster. Testing Needs to Stop Living in Spreadsheets."
}