Where is the line between heavy API usage and systematic model extraction?
OpenAI Developer Community
February 28, 2026
As API-based foundation models scale, I’ve been thinking about the boundary between normal high-volume usage (benchmarks, evaluation runs, synthetic data generation) and structured querying designed to approximate or distill capabilities.
At what point does usage meaningfully become “model extraction,” and is that even a technically enforceable distinction?
It seems like:
* Call count alone isn’t meaningful
* Token volume matters
* Structured prompt variation might matter
* Intent is almost impossible to prove
I’m curious how people here think about this from both a technical and governance perspective.
Discussion in the ATmosphere