Google Gemini CLI's Rate Limiting Crisis: When Paying Customers Get the Same Treatment as Free Users
Hmm…
Here is the clearest way to think about it.
The short version
This issue usually happens because Gemini CLI is not a single, simple service path. Depending on how you sign in, your request may go through different backends, quotas, and routing rules. The most common pattern in recent reports is this:
- Google-login users on paid plans hit
429 RESOURCE_EXHAUSTEDorMODEL_CAPACITY_EXHAUSTEDin Gemini CLI. (GitHub) - The same account may still work in Google AI Studio or via a Gemini API key. (GitHub)
- That means the failure is often not “your account has no access at all.” It is more often a problem with the CLI’s auth/routing path , shared service capacity , or how the CLI handles retries and fallback. (GitHub)
Background
1. Gemini CLI has multiple auth paths
Google’s docs show three main ways to use Gemini CLI:
- Sign in with Google
- Use a Gemini API key
- Use Vertex AI (Gemini CLI)
These are not just different login screens. They can mean different quota behavior, different routing, and different operational failure modes. (Gemini CLI)
2. Paid does not mean dedicated capacity
Google’s quota docs say the limits are not identical across tiers. For Google-login usage, the published maximums are 1,000/day for individual free use, 1,500/day for Google AI Pro, and 2,000/day for Google AI Ultra. The docs also say requests are limited per user per minute and are subject to service availability in times of high demand. (Gemini CLI)
That distinction matters. A paid plan can give you higher published limits and still fail when the shared service path is overloaded. (Gemini CLI)
3. One prompt is not always one backend request
Google also says Gemini Code Assist agent mode and Gemini CLI share quotas, and one prompt may result in multiple model requests. (Google for Developers)
So some users really do burn quota faster than expected. But that alone does not explain cases where the first prompt fails immediately after reinstalling or clearing local state. (Google for Developers)
The main causes
Cause 1: Shared service capacity on the Google-login path
Google posted a service update saying that, starting March 25, 2026 , Gemini CLI traffic routing would give higher priority based on license type and account standing , and that some customers could encounter capacity-related limitations during high traffic. (GitHub)
This lines up with recent bug reports showing:
429 RESOURCE_EXHAUSTEDMODEL_CAPACITY_EXHAUSTED- backend messages like “No capacity available for model gemini-3.1-pro-preview on the server” (GitHub)
What this means in plain English
You may still have a valid paid account. You may still be under your daily limit. But the specific serving lane used by Gemini CLI with Google login can still be congested or deprioritized. (GitHub)
Cause 2: Google-login auth and API-key auth are not behaving the same
The strongest recurring pattern is this:
- Google login fails in Gemini CLI
- API key works
- sometimes AI Studio also works with the same account (GitHub)
That strongly suggests the problem is often in the Gemini CLI Google-login / Code Assist route , not in the user’s general entitlement to Gemini models. (GitHub)
Why this happens
When you sign in with Google, Gemini CLI goes through the Gemini Code Assist / CLI entitlement path. When you use an API key, you are using a different auth and billing path documented separately. (Gemini CLI)
So “same account, different result” is very plausible here. (Gemini CLI)
Cause 3: Entitlement routing can choose the wrong tier
There are reports that Gemini CLI can bind the session to the wrong entitlement when one Google account has multiple overlapping subscriptions. Examples include:
- a user with both consumer Google One AI Pro/Ultra and Enterprise Gemini Code Assist Standard getting routed to the consumer entitlement instead of the enterprise one
- a Workspace user whose CLI session still behaves like
oauth-personaland falls back incorrectly (GitHub)
Why this matters
If the wrong entitlement is selected, several things can go wrong:
- the wrong quota bucket may be used
- the wrong serving priority may be applied
- the wrong terms or data-governance lane may be used for enterprise-sensitive work (GitHub)
For Standard and Enterprise, Google says prompts and responses are not used to train those models and are handled under Google Cloud terms and the Cloud Data Processing Addendum. (Google for Developers)
So this is not only a rate-limit issue. For some users, it is also a policy and data-handling issue. (GitHub)
Cause 4: CLI fallback and retry behavior can make the problem look worse
Gemini CLI officially has a model fallback mechanism. If the default Pro model is rate-limited, the CLI can fall back to Flash for the session. Google’s docs also describe capacity errors and say the CLI may offer retry behavior with backoff. (Gemini CLI)
In practice, issue reports show this can go badly:
- fallback logic for 429s has been called fragile
- some users report the CLI hangs instead of surfacing the real error
- one recent issue reports a single message being resent 30–50 times during Pro → Flash fallback (GitHub)
Why this matters
Even if the original problem is “only” capacity pressure, the CLI can turn it into a bigger mess by:
- hiding the real error
- retrying too much
- switching models in confusing ways
- multiplying request count unexpectedly (GitHub)
Cause 5: Preview-model pressure can amplify everything
Google’s Gemini CLI docs explicitly call out capacity errors for Gemini 3 Pro and explain that preview access and routing behavior can vary. Google’s Gemini API model docs also say Gemini 3 Pro Preview was shut down on March 9, 2026 , with migration to Gemini 3.1 Pro Preview needed to avoid disruption. (Gemini CLI)
What this means
If a user is pinned to a preview model, or is heavily relying on Pro-preview routing during a busy period, they are more exposed to:
- capacity exhaustion
- fallback behavior
- version churn
- confusing interruptions (Gemini CLI)
This is usually a multiplier , not the only root cause. (Gemini CLI)
How to diagnose the case quickly
Use this simple mental checklist.
Case A: AI Studio works, Gemini CLI with Google login fails
This usually points to a CLI auth/routing issue or Code Assist backend issue , not a total account outage. (GitHub)
Case B: Google login fails, API key works
This strongly points to a problem in the Google-login entitlement path , not the model family in general. (GitHub)
Case C: Error explicitly says capacity or no capacity available
This is usually shared-server overload or priority gating , not just “you used too much.” (GitHub)
Case D: You have both personal and enterprise licenses on one account
Suspect entitlement misrouting first. (GitHub)
Case E: The CLI hangs, loops, or repeats the same message
Suspect retry/fallback bugs in addition to any real backend limit. (GitHub)
The best solutions
Solution 1: For reliability, switch away from Google-login auth
This is the most practical fix.
Use either:
- Gemini API key
- Vertex AI (Gemini CLI)
Why this helps:
- it avoids the flaky Google-login / Code Assist path seen in many reports
- it gives you a more direct auth and billing path
- it often works when the Google-login path does not (GitHub)
For API key usage, Google documents creating and managing keys in Google AI Studio and using GEMINI_API_KEY. (Google AI for Developers)
Best fit
Use this when you want the fastest path back to working CLI sessions. (Gemini CLI)
Solution 2: If you need enterprise controls, use a clean enterprise path
If your concern is not only uptime but also data handling, use a setup that clearly lands in Standard / Enterprise / Vertex rather than a mixed personal account flow. Google says Standard and Enterprise prompts and responses are not used to train the models. (Google for Developers)
Best fit
Use this when you are working with private company code or need strong governance guarantees. (Google for Developers)
Solution 3: Avoid mixed entitlements on one Google account
If one account has both:
- consumer Google AI Pro/Ultra
- enterprise Gemini Code Assist licensing
then the safest move is to avoid that mixed setup until the entitlement selection problem is clearer. (GitHub)
Practical options:
- use a separate account for personal use
- use a separate account for enterprise use
- avoid assuming
GOOGLE_CLOUD_PROJECTwill force the desired consumer vs enterprise choice when using Google login, because users reported that it did not solve the routing problem (GitHub)
Solution 4: Reduce exposure to preview-model capacity trouble
If you are seeing capacity errors on Pro-preview models, move to a simpler routing choice temporarily:
- use auto routing
- or use a more stable non-preview option if available
- do not insist on a pinned Pro-preview model during a live capacity incident (Gemini CLI)
Why this helps:
- preview models are more exposed to churn and capacity pressure
- the docs explicitly discuss capacity errors for Gemini 3 Pro usage (Gemini CLI)
Solution 5: Update the CLI, but do not expect that alone to fix it
Keeping the CLI current is still sensible because model routing and auth behavior are actively changing. Gemini CLI’s changelog shows ongoing changes to model routing and release behavior. (Gemini CLI)
But if the underlying issue is:
- server capacity
- auth-path routing
- entitlement misclassification
then updating alone may not fix it. (GitHub)
Solution 6: Treat hanging or repeated retries as a bug signal, not user error
If the CLI:
- hangs on a simple prompt
- keeps saying it is retrying
- duplicates your message
- silently falls back in a way that breaks your workflow
then stop interpreting that as normal quota exhaustion. That behavior is consistent with known issue reports. (GitHub)
Practical response
Switch auth method first. That is usually higher leverage than repeatedly wiping caches or reinstalling. (GitHub)
Solution 7: Do not waste time over-debugging local state if the pattern matches
If all of these are true:
- paid plan is active
- AI Studio works
- Gemini CLI with Google login fails
- failures persist after clearing local state
then the evidence points much more toward a service-path problem than a local machine problem. (GitHub)
That means:
- reinstalling again is unlikely to help much
- switching auth path is the better move
- watching service updates and issue tracker activity matters more than local tweaks (GitHub)
What is the most likely root cause in plain language
The most likely root cause is not “paid users are literally given the same quota as free users.” The docs do not say that. The more accurate explanation is:
- Google-login Gemini CLI traffic uses a shared service path with capacity controls. (Gemini CLI)
- Recent traffic-priority changes increased the chance of capacity-related limitations on that path. (GitHub)
- Some accounts appear to be getting misrouted or misclassified by entitlement. (GitHub)
- The CLI’s retry and fallback behavior can make the failure look even worse than it is. (GitHub)
That combination produces the real-world symptom: a paid user gets little or no practical benefit from the Google-login path at the moment they need it. (GitHub)
The simplest recommendation
If you want the cleanest practical answer:
Use this order
- Try Gemini CLI with a Gemini API key. (Google AI for Developers)
- If you need enterprise controls, move to Vertex AI or a clearly enterprise-managed Code Assist path. (Gemini CLI)
- Avoid mixed personal + enterprise entitlements on one Google account. (GitHub)
- Avoid pinned preview Pro models during capacity incidents. (Gemini CLI)
- Treat CLI hangs and duplicate retries as a known failure pattern , not as proof that your setup is wrong. (GitHub)
Discussion in the ATmosphere