Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreib5qewfeoanhatinxar6374j66ewxhclrmrwcvvb722fto6oenkke",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mkomcxv26532"
  },
  "path": "/t/rate-limits-are-incorrectly-calculated-when-using-many-custom-mcp-function-calls/1380074#post_1",
  "publishedAt": "2026-04-30T00:02:56.000Z",
  "site": "https://community.openai.com",
  "tags": [
    "github.com",
    "GitHub - CallumFerguson/openai-rate-limit-mcp-repro"
  ],
  "textContent": "I am having a problem with incorrectly being rate limited when using many custom mcp server function calls.\n\nExample request id:\n`“x-request-id”: “req_25ad032454394ec8ba16d6eced0b417c”`\n\nOpenAI error body:\n\n\n    {\n      \"type\": \"tokens\",\n      \"code\": \"rate_limit_exceeded\",\n      \"message\": \"Rate limit reached for gpt-5.4-nano in organization org-Lwa5FRv8Oa6u89TPBar0bFF5 on tokens per min (TPM): Limit 200000, Used 189899, Requested 11394. Please try again in 387ms. Visit https://platform.openai.com/account/rate-limits to learn more.\",\n      \"param\": null\n    }\n\n\nThis response, which only used about 11k tokens, was done on its own after waiting a minute for the rate limits to reset.\n\nI have created a project which minimally reproduces the problem. The example project has a simple mcp server with an echo mcp function. The prompt asks it to use the echo mcp function 30 times. It then appends “lorem ipsum” to the prompt until it is about 10k tokens:\n\ngithub.com\n\n### GitHub - CallumFerguson/openai-rate-limit-mcp-repro\n\nContribute to CallumFerguson/openai-rate-limit-mcp-repro development by creating an account on GitHub.\n\nIt seems like while an api request is being processed, every time an mcp function is called, it counts the entire input tokens again for the rate limits. So if the api request has 10k input tokens, and it calls an mcp function 30 times, I will get rate limited because 10k * 30 > 200k tokens which is my account’s tokens per minute limit for the model I am using.\n\nIf I instead have the prompt be 10k tokens and have it call the mcp function 15 times, I do not get rate limited, and after the request finishes, it correctly states that I only used about 11k tokens (10k input, 1k output).\n\nIn all cases, my account is charged for the correct token usage, and if the api request finishes with no errors, the usage tokens is correct. It only seems to be a problem if a single request input tokens * num mcp calls is greater than the rate limits for the account.",
  "title": "Rate limits are incorrectly calculated when using many custom mcp function calls"
}