{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreihxuwjox7c5rnhazuus7fqloiazxdxraiciuc67gt44sxzc4ailui",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mi4ktmdsaxf2"
},
"path": "/t/google-gemini-clis-rate-limiting-crisis-when-paying-customers-get-the-same-treatment-as-free-users/174697#post_3",
"publishedAt": "2026-03-28T10:29:54.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"GitHub",
"Gemini CLI",
"Google for Developers",
"Google AI for Developers"
],
"textContent": "Hmm…\n\n* * *\n\nHere is the clearest way to think about it.\n\n## The short version\n\nThis issue usually happens because **Gemini CLI is not a single, simple service path**. Depending on how you sign in, your request may go through different backends, quotas, and routing rules. The most common pattern in recent reports is this:\n\n * **Google-login users on paid plans** hit `429 RESOURCE_EXHAUSTED` or `MODEL_CAPACITY_EXHAUSTED` in Gemini CLI. (GitHub)\n * The **same account** may still work in **Google AI Studio** or via a **Gemini API key**. (GitHub)\n * That means the failure is often **not** “your account has no access at all.” It is more often a problem with the **CLI’s auth/routing path** , **shared service capacity** , or **how the CLI handles retries and fallback**. (GitHub)\n\n\n\n## Background\n\n### 1. Gemini CLI has multiple auth paths\n\nGoogle’s docs show three main ways to use Gemini CLI:\n\n * **Sign in with Google**\n * **Use a Gemini API key**\n * **Use Vertex AI** (Gemini CLI)\n\n\n\nThese are not just different login screens. They can mean different quota behavior, different routing, and different operational failure modes. (Gemini CLI)\n\n### 2. Paid does not mean dedicated capacity\n\nGoogle’s quota docs say the limits are **not identical** across tiers. For Google-login usage, the published maximums are **1,000/day** for individual free use, **1,500/day** for Google AI Pro, and **2,000/day** for Google AI Ultra. The docs also say requests are limited **per user per minute** and are **subject to service availability in times of high demand**. (Gemini CLI)\n\nThat distinction matters. A paid plan can give you **higher published limits** and still fail when the shared service path is overloaded. (Gemini CLI)\n\n### 3. One prompt is not always one backend request\n\nGoogle also says Gemini Code Assist agent mode and Gemini CLI share quotas, and **one prompt may result in multiple model requests**. (Google for Developers)\n\nSo some users really do burn quota faster than expected. But that alone does **not** explain cases where the **first prompt** fails immediately after reinstalling or clearing local state. (Google for Developers)\n\n* * *\n\n## The main causes\n\n## Cause 1: Shared service capacity on the Google-login path\n\nGoogle posted a service update saying that, starting **March 25, 2026** , Gemini CLI traffic routing would give higher priority based on **license type** and **account standing** , and that some customers could encounter **capacity-related limitations** during high traffic. (GitHub)\n\nThis lines up with recent bug reports showing:\n\n * `429 RESOURCE_EXHAUSTED`\n * `MODEL_CAPACITY_EXHAUSTED`\n * backend messages like **“No capacity available for model gemini-3.1-pro-preview on the server”** (GitHub)\n\n\n\n### What this means in plain English\n\nYou may still have a valid paid account. You may still be under your daily limit. But the **specific serving lane** used by Gemini CLI with Google login can still be congested or deprioritized. (GitHub)\n\n* * *\n\n## Cause 2: Google-login auth and API-key auth are not behaving the same\n\nThe strongest recurring pattern is this:\n\n * **Google login fails in Gemini CLI**\n * **API key works**\n * sometimes **AI Studio also works** with the same account (GitHub)\n\n\n\nThat strongly suggests the problem is often in the **Gemini CLI Google-login / Code Assist route** , not in the user’s general entitlement to Gemini models. (GitHub)\n\n### Why this happens\n\nWhen you sign in with Google, Gemini CLI goes through the **Gemini Code Assist / CLI entitlement path**. When you use an API key, you are using a different auth and billing path documented separately. (Gemini CLI)\n\nSo “same account, different result” is very plausible here. (Gemini CLI)\n\n* * *\n\n## Cause 3: Entitlement routing can choose the wrong tier\n\nThere are reports that Gemini CLI can bind the session to the wrong entitlement when one Google account has multiple overlapping subscriptions. Examples include:\n\n * a user with both **consumer Google One AI Pro/Ultra** and **Enterprise Gemini Code Assist Standard** getting routed to the consumer entitlement instead of the enterprise one\n * a Workspace user whose CLI session still behaves like `oauth-personal` and falls back incorrectly (GitHub)\n\n\n\n### Why this matters\n\nIf the wrong entitlement is selected, several things can go wrong:\n\n * the wrong quota bucket may be used\n * the wrong serving priority may be applied\n * the wrong terms or data-governance lane may be used for enterprise-sensitive work (GitHub)\n\n\n\nFor Standard and Enterprise, Google says prompts and responses are **not used to train** those models and are handled under Google Cloud terms and the Cloud Data Processing Addendum. (Google for Developers)\n\nSo this is not only a rate-limit issue. For some users, it is also a **policy and data-handling** issue. (GitHub)\n\n* * *\n\n## Cause 4: CLI fallback and retry behavior can make the problem look worse\n\nGemini CLI officially has a **model fallback** mechanism. If the default Pro model is rate-limited, the CLI can fall back to Flash for the session. Google’s docs also describe capacity errors and say the CLI may offer retry behavior with backoff. (Gemini CLI)\n\nIn practice, issue reports show this can go badly:\n\n * fallback logic for 429s has been called fragile\n * some users report the CLI **hangs** instead of surfacing the real error\n * one recent issue reports a single message being resent **30–50 times** during Pro → Flash fallback (GitHub)\n\n\n\n### Why this matters\n\nEven if the original problem is “only” capacity pressure, the CLI can turn it into a bigger mess by:\n\n * hiding the real error\n * retrying too much\n * switching models in confusing ways\n * multiplying request count unexpectedly (GitHub)\n\n\n\n* * *\n\n## Cause 5: Preview-model pressure can amplify everything\n\nGoogle’s Gemini CLI docs explicitly call out **capacity errors** for Gemini 3 Pro and explain that preview access and routing behavior can vary. Google’s Gemini API model docs also say **Gemini 3 Pro Preview was shut down on March 9, 2026** , with migration to **Gemini 3.1 Pro Preview** needed to avoid disruption. (Gemini CLI)\n\n### What this means\n\nIf a user is pinned to a preview model, or is heavily relying on Pro-preview routing during a busy period, they are more exposed to:\n\n * capacity exhaustion\n * fallback behavior\n * version churn\n * confusing interruptions (Gemini CLI)\n\n\n\nThis is usually a **multiplier** , not the only root cause. (Gemini CLI)\n\n* * *\n\n## How to diagnose the case quickly\n\nUse this simple mental checklist.\n\n### Case A: AI Studio works, Gemini CLI with Google login fails\n\nThis usually points to a **CLI auth/routing issue** or **Code Assist backend issue** , not a total account outage. (GitHub)\n\n### Case B: Google login fails, API key works\n\nThis strongly points to a problem in the **Google-login entitlement path** , not the model family in general. (GitHub)\n\n### Case C: Error explicitly says capacity or no capacity available\n\nThis is usually **shared-server overload or priority gating** , not just “you used too much.” (GitHub)\n\n### Case D: You have both personal and enterprise licenses on one account\n\nSuspect **entitlement misrouting** first. (GitHub)\n\n### Case E: The CLI hangs, loops, or repeats the same message\n\nSuspect **retry/fallback bugs** in addition to any real backend limit. (GitHub)\n\n* * *\n\n## The best solutions\n\n## Solution 1: For reliability, switch away from Google-login auth\n\nThis is the most practical fix.\n\nUse either:\n\n * **Gemini API key**\n * **Vertex AI** (Gemini CLI)\n\n\n\nWhy this helps:\n\n * it avoids the flaky Google-login / Code Assist path seen in many reports\n * it gives you a more direct auth and billing path\n * it often works when the Google-login path does not (GitHub)\n\n\n\nFor API key usage, Google documents creating and managing keys in Google AI Studio and using `GEMINI_API_KEY`. (Google AI for Developers)\n\n### Best fit\n\nUse this when you want the fastest path back to working CLI sessions. (Gemini CLI)\n\n* * *\n\n## Solution 2: If you need enterprise controls, use a clean enterprise path\n\nIf your concern is not only uptime but also data handling, use a setup that clearly lands in **Standard / Enterprise / Vertex** rather than a mixed personal account flow. Google says Standard and Enterprise prompts and responses are not used to train the models. (Google for Developers)\n\n### Best fit\n\nUse this when you are working with private company code or need strong governance guarantees. (Google for Developers)\n\n* * *\n\n## Solution 3: Avoid mixed entitlements on one Google account\n\nIf one account has both:\n\n * consumer Google AI Pro/Ultra\n * enterprise Gemini Code Assist licensing\n\n\n\nthen the safest move is to avoid that mixed setup until the entitlement selection problem is clearer. (GitHub)\n\nPractical options:\n\n * use a separate account for personal use\n * use a separate account for enterprise use\n * avoid assuming `GOOGLE_CLOUD_PROJECT` will force the desired consumer vs enterprise choice when using Google login, because users reported that it did not solve the routing problem (GitHub)\n\n\n\n* * *\n\n## Solution 4: Reduce exposure to preview-model capacity trouble\n\nIf you are seeing capacity errors on Pro-preview models, move to a simpler routing choice temporarily:\n\n * use auto routing\n * or use a more stable non-preview option if available\n * do not insist on a pinned Pro-preview model during a live capacity incident (Gemini CLI)\n\n\n\nWhy this helps:\n\n * preview models are more exposed to churn and capacity pressure\n * the docs explicitly discuss capacity errors for Gemini 3 Pro usage (Gemini CLI)\n\n\n\n* * *\n\n## Solution 5: Update the CLI, but do not expect that alone to fix it\n\nKeeping the CLI current is still sensible because model routing and auth behavior are actively changing. Gemini CLI’s changelog shows ongoing changes to model routing and release behavior. (Gemini CLI)\n\nBut if the underlying issue is:\n\n * server capacity\n * auth-path routing\n * entitlement misclassification\n\n\n\nthen updating alone may not fix it. (GitHub)\n\n* * *\n\n## Solution 6: Treat hanging or repeated retries as a bug signal, not user error\n\nIf the CLI:\n\n * hangs on a simple prompt\n * keeps saying it is retrying\n * duplicates your message\n * silently falls back in a way that breaks your workflow\n\n\n\nthen stop interpreting that as normal quota exhaustion. That behavior is consistent with known issue reports. (GitHub)\n\n### Practical response\n\nSwitch auth method first. That is usually higher leverage than repeatedly wiping caches or reinstalling. (GitHub)\n\n* * *\n\n## Solution 7: Do not waste time over-debugging local state if the pattern matches\n\nIf all of these are true:\n\n * paid plan is active\n * AI Studio works\n * Gemini CLI with Google login fails\n * failures persist after clearing local state\n\n\n\nthen the evidence points much more toward a **service-path problem** than a local machine problem. (GitHub)\n\nThat means:\n\n * reinstalling again is unlikely to help much\n * switching auth path is the better move\n * watching service updates and issue tracker activity matters more than local tweaks (GitHub)\n\n\n\n* * *\n\n## What is the most likely root cause in plain language\n\nThe most likely root cause is **not** “paid users are literally given the same quota as free users.” The docs do not say that. The more accurate explanation is:\n\n 1. Google-login Gemini CLI traffic uses a **shared service path** with capacity controls. (Gemini CLI)\n 2. Recent traffic-priority changes increased the chance of **capacity-related limitations** on that path. (GitHub)\n 3. Some accounts appear to be getting **misrouted or misclassified** by entitlement. (GitHub)\n 4. The CLI’s retry and fallback behavior can make the failure look even worse than it is. (GitHub)\n\n\n\nThat combination produces the real-world symptom: **a paid user gets little or no practical benefit from the Google-login path at the moment they need it.** (GitHub)\n\n* * *\n\n## The simplest recommendation\n\nIf you want the cleanest practical answer:\n\n### Use this order\n\n 1. **Try Gemini CLI with a Gemini API key.** (Google AI for Developers)\n 2. If you need enterprise controls, move to **Vertex AI or a clearly enterprise-managed Code Assist path**. (Gemini CLI)\n 3. Avoid mixed personal + enterprise entitlements on one Google account. (GitHub)\n 4. Avoid pinned preview Pro models during capacity incidents. (Gemini CLI)\n 5. Treat CLI hangs and duplicate retries as a **known failure pattern** , not as proof that your setup is wrong. (GitHub)\n\n",
"title": "Google Gemini CLI's Rate Limiting Crisis: When Paying Customers Get the Same Treatment as Free Users"
}