API Error Codes: A Test Suite Pattern I Stole from Stripe
Read Stripe's API reference for an hour and you'll notice every endpoint has a complete enumerated list of error codes with example payloads. Then look at your own API.
That comparison can be a little uncomfortable.
Stripe doesn't treat errors as an afterthought. Their documentation gives error responses nearly as much attention as successful ones. Every endpoint explains not only what can go right, but exactly what can go wrong, including structured error codes, HTTP status codes, descriptions, and example payloads.
Most APIs aren't like that.
They document 200 OK responses in detail while reducing failures to a short table:
- 400 Bad Request
- 401 Unauthorized
- 404 Not Found
- 500 Internal Server Error
The business errors—the ones your users actually encounter—are often hidden inside controller code, scattered across wiki pages, or not documented at all.
The same imbalance usually exists in the test suite.
Hundreds of happy-path tests.
Very few negative tests.
A few years ago, I borrowed a simple idea from Stripe and turned it into one of the most valuable testing patterns we've adopted.
Instead of writing negative tests one by one, we created a centralized error-code catalog and generated much of the negative test suite directly from it.
The result wasn't just better API error code testing.
It also improved documentation, reduced maintenance, and made breaking changes much harder to introduce accidentally.
Here's how the pattern works.
Why Error Responses Are Part of Your API Contract
Many teams unconsciously treat successful responses as the "real" API.
Everything else is considered an exception.
That mindset creates fragile systems.
Imagine a payment API.
A successful request returns:
{
"paymentId": "PAY-1001",
"status": "Succeeded"
}
Easy enough.
Now consider all the legitimate failure cases:
- Card expired
- Insufficient funds
- Duplicate transaction
- Currency not supported
- Merchant suspended
- Fraud detected
- Payment amount exceeds limit
Those aren't bugs.
They're expected business outcomes.
If your consumers must handle them, then those responses are every bit as much a part of the API contract as the successful response.
Once you accept that idea, testing strategy changes dramatically.
The Error-Code Catalog as a Test Input
The foundation of this approach is maintaining a single catalog of every business error the API intentionally exposes.
For example:
errors:
USER_NOT_FOUND:
httpStatus: 404
message: User not found
EMAIL_ALREADY_EXISTS:
httpStatus: 409
message: Email already exists
INVALID_TOKEN:
httpStatus: 401
message: Invalid authentication token
PAYMENT_DECLINED:
httpStatus: 402
message: Payment declined
ORDER_ALREADY_SHIPPED:
httpStatus: 409
message: Order cannot be modified
Notice what's happening here.
We're no longer documenting HTTP status codes alone.
We're documenting business outcomes.
Every new error introduced by engineering must first appear inside this catalog.
That simple rule creates a surprising amount of consistency.
Why a Catalog Matters
Without one:
- Developers invent new response formats.
- Documentation drifts.
- QA forgets edge cases.
- Frontend developers discover failures during integration.
With one:
- Documentation stays centralized.
- Consumers understand every supported failure.
- Every error becomes automatically testable.
The catalog becomes an executable specification.
One Test Per Error Code, Generated from the Catalog
Once every supported error exists in one place, generating baseline negative tests becomes straightforward.
Instead of manually writing dozens of nearly identical scenarios, a generator simply walks through the catalog.
Conceptually:
for (const error of errorCatalog) {
generateNegativeTest(error);
}
Each generated test validates:
- Expected HTTP status
- Business error code
- Error message
- Response schema
For example, suppose the catalog contains:
EMAIL_ALREADY_EXISTS
The generated scenario becomes:
- Create a customer.
- Attempt to create the same customer again.
- Verify:
{
"code": "EMAIL_ALREADY_EXISTS",
"message": "Email already exists"
}
No engineer needed to remember to write that negative test.
Adding a new business error automatically creates a new baseline test.
Why This Scales
Imagine:
- 180 endpoints
- 95 business error codes
Without automation, every additional error increases maintenance.
With generation:
- Documentation grows.
- Test coverage grows.
- Maintenance barely changes.
Engineers can spend their time writing meaningful business scenarios instead of repetitive validation tests.
The Shape Assertion That Prevents Silent Error Drift
Checking only the HTTP status code is one of the biggest mistakes in error response testing.
Consider this response:
{
"code": "USER_NOT_FOUND",
"message": "User not found",
"requestId": "abc123"
}
Months later, someone simplifies the global exception handler.
Now the API returns:
{
"error": "User not found"
}
The endpoint still returns:
404
Most tests still pass.
Meanwhile:
- Mobile applications break.
- Frontend parsing fails.
- Monitoring dashboards stop correlating request IDs.
Nobody notices until production.
Shape Assertions
To prevent this, every generated negative test also validates the structure of the response.
For example:
expect(response.body).toEqual({
code: expect.any(String),
message: expect.any(String),
requestId: expect.any(String)
});
Notice we're validating more than values.
We're validating the response contract itself.
That single assertion catches:
- Missing fields
- Renamed properties
- Structural changes
- Serialization mistakes
before consumers experience them.
Why It Matters
Many API consumers depend on fields such as:
- Error code
- Message
- Correlation ID
- Documentation URL
- Retry hint
Changing any of those silently becomes a breaking API change.
Shape assertions make those changes impossible to miss.
Keeping the Catalog in Sync With the Code (Code Generation)
The obvious concern is maintenance.
Nobody wants to update:
- Source code
- Documentation
- Tests
- Error catalog
manually every time a new error appears.
Fortunately, most modern applications already define business errors centrally.
Example:
export enum ErrorCode {
USER_NOT_FOUND,
PAYMENT_DECLINED,
INVALID_TOKEN,
EMAIL_ALREADY_EXISTS
}
From this single definition, it's possible to generate:
- Markdown documentation
- OpenAPI components
- Error catalogs
- Client SDK constants
- Negative tests
Everything derives from one source.
That dramatically reduces maintenance.
Additional Benefits
Once generation is introduced:
Documentation Never Falls Behind
The documentation updates whenever the enum changes.
Generated Tests Stay Current
Every new business error immediately receives baseline coverage.
SDKs Stay Consistent
Frontend applications can reference generated constants rather than string literals.
Reviews Improve
Adding a new business error becomes visible during pull request review.
Instead of hiding inside controller code, it's now part of the API contract.
The Two Error Codes We Deliberately Don't Test
Although the catalog covers almost every business failure, there are two categories we intentionally exclude.
1. Generic Internal Server Errors
Example:
500 Internal Server Error
These represent unexpected failures.
They're not business behavior.
Instead of attempting to trigger every possible server crash, we simply verify:
- Sensitive stack traces aren't exposed.
- Generic messages are returned.
- Request IDs are included.
- Logging works correctly.
Testing every internal failure path produces little value.
Testing the response contract provides much more.
2. Infrastructure Failures
Examples include:
- Database unavailable
- Kafka offline
- Redis unreachable
- DNS failure
- Cloud storage outage
These aren't business errors.
They're infrastructure events.
We test them separately using:
- Chaos engineering
- Fault injection
- Resilience testing
- Disaster recovery exercises
Keeping them outside the regular API negative tests avoids unnecessary instability in CI pipelines.
Unexpected Benefits
After adopting this approach, several improvements appeared that we hadn't anticipated.
Better Documentation
Engineers could browse every supported business error in one place.
Cleaner APIs
Every endpoint returned a consistent error structure.
Faster Reviews
New business errors became obvious during pull requests.
Happier Frontend Teams
Consumers no longer guessed which failures might occur.
Stronger Regression Protection
Structural changes to error responses surfaced immediately.
Final Thoughts
Most API teams invest enormous effort in testing successful requests while giving comparatively little attention to failures.
Stripe's documentation demonstrates a different philosophy.
Errors are part of the public API contract.
They deserve documentation.
They deserve consistency.
And they deserve automated tests.
By maintaining an error-code catalog, generating one baseline test per error, validating response shapes, and deriving documentation from code, you can significantly reduce maintenance while increasing confidence that your API behaves consistently—even as it evolves.
The best part is that this approach scales naturally.
As new business errors appear, your documentation and test suite grow automatically rather than relying on engineers to remember yet another negative test.
If you're looking to automate this style of contract-driven testing, you can spin up a free trial to try this catalog pattern and explore how generated negative tests, schema validation, and API contracts can work together to keep your error handling consistent over time.
Discussion in the ATmosphere