Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidyvfd2v5yucxin5cbqyuac4sbo4kl3dbygmn5dzshiffwqiik3oy",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mfxjnmgsez52"
  },
  "path": "/t/where-is-the-line-between-heavy-api-usage-and-systematic-model-extraction/1375387#post_1",
  "publishedAt": "2026-02-28T23:14:34.000Z",
  "site": "https://community.openai.com",
  "textContent": "As API-based foundation models scale, I’ve been thinking about the boundary between normal high-volume usage (benchmarks, evaluation runs, synthetic data generation) and structured querying designed to approximate or distill capabilities.\n\nAt what point does usage meaningfully become “model extraction,” and is that even a technically enforceable distinction?\n\nIt seems like:\n\n  * Call count alone isn’t meaningful\n\n  * Token volume matters\n\n  * Structured prompt variation might matter\n\n  * Intent is almost impossible to prove\n\n\n\n\nI’m curious how people here think about this from both a technical and governance perspective.",
  "title": "Where is the line between heavy API usage and systematic model extraction?"
}