{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreicdw76x3h2tuqm7madndv5faue4a3m7io33uagmkzwkizyofykt5a",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3miq3djl3x762"
},
"path": "/t/i-cant-use-the-service-anymore/174970#post_2",
"publishedAt": "2026-04-04T23:24:16.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"Hugging Face",
"Hugging Face Forums"
],
"textContent": "Maybe you’ve just used up your monthly allowance ($2.00)… It will be replenished after one month.\n\n* * *\n\n# Guide: Fixing Hugging Face’s “You have depleted your monthly included credits” error when you already have PRO\n\n## The short version\n\nA **Hugging Face PRO subscription does not include unlimited Inference Providers usage**. The current Hugging Face docs say PRO includes **$2.00 per month** in Inference Providers credits, and **compute usage is billed separately** from the PRO subscription itself. That means you can be a valid PRO subscriber and still get blocked once the included routed-inference credits are gone. (Hugging Face)\n\n## What this error usually means\n\nWhen you see:\n\n> **Failed to perform inference: You have depleted your monthly included credits. Purchase pre-paid credits to continue using Inference Providers.**\n\nthe platform is usually telling you something about the **Inference Providers billing layer** , not about whether your PRO plan exists. In Hugging Face’s current billing model, routed requests through Inference Providers use a monthly credit pool first, then require additional paid usage after that. (Hugging Face)\n\n## Why this is confusing\n\nHugging Face now uses **Inference Providers** as the unified routed-inference system, and the docs note that **`hf-inference` used to be called “Inference API (serverless)”**. That transition makes older expectations misleading. Many users still think “I have PRO” should mean “my API calls should keep working,” but the current model is really **subscription + included credits + separate compute billing**. (Hugging Face)\n\n## The most common root causes\n\n### 1. You really did use up the included PRO credits\n\nThis is the most common explanation. Hugging Face’s pricing docs currently say **Free users get $0.10/month** , **PRO users get $2.00/month** , and **Team or Enterprise organizations get $2.00 per seat per month** for Inference Providers. After that, continued usage requires additional purchased credits. (Hugging Face)\n\nA key detail: your credits may have been consumed by more than just your own API code. Hugging Face’s docs say **model-page widgets** , the **Inference Playground** , and **Data Studio AI** also use Inference Providers and count against the same monthly credits. (Hugging Face)\n\n### 2. Your PRO account is active, but your account is not ready for paid continuation\n\nHugging Face’s billing docs say **compute services are billed separately** from PRO, and the only supported payment method for compute services is **credit cards**. Public Hugging Face support replies also say this same **402-style** failure often happens when **there is no payment method on the account**. (Hugging Face)\n\nSo the hidden problem may be:\n\n * PRO is active\n * the included credits are exhausted\n * but Hugging Face cannot continue charging usage because the compute-billing path is not set up correctly. (Hugging Face)\n\n\n\n### 3. Your token is wrong, stale, or missing the required permission\n\nHugging Face’s Inference Providers docs say you should use a **fine-grained token** with **“Make calls to Inference Providers”** permission. Their `InferenceClient` docs also say that if you do **not** explicitly pass a token, the client will default to the **locally saved token**. (Hugging Face)\n\nThat creates a common failure mode:\n\n * you generated a new token\n * but your code is still using an older token saved on your machine\n * or the active token does not have the Inference Providers permission enabled. (Hugging Face)\n\n\n\nPublic Hugging Face forum replies support this too. In similar 402 reports, staff and experienced users explicitly pointed to **missing payment methods** and **wrong token permissions** as common causes. (Hugging Face Forums)\n\n### 4. Your requests are being billed to the wrong account\n\nIf you belong to a Team or Enterprise organization, this matters a lot. Hugging Face’s pricing docs say requests are billed to the **user account by default** , and org billing only applies if you explicitly set the billing target, such as with `bill_to=\"my-org-name\"` or the `X-HF-Bill-To` header. (Hugging Face)\n\nSo a person can be part of a paid organization, use a valid token, and still hit the personal monthly limit because the request is not actually being billed to the organization. Public reports show this is a real pattern. (Hugging Face)\n\n* * *\n\n# Step-by-step fix\n\n## Step 1: Check whether the credits are actually gone\n\nOpen your **Inference Providers usage** page and your **Billing** page. Hugging Face says the usage view shows the past month’s usage broken down by **model** and **provider**. That is the fastest way to confirm whether this is true credit exhaustion or a lookalike configuration problem. (Hugging Face)\n\n### What to look for\n\n * If usage is clearly nontrivial and the credits are spent, the message is probably accurate. (Hugging Face)\n * If usage looks very low or inconsistent, move to billing and token checks. The same error can appear in those cases too. (Hugging Face Forums)\n\n\n\n## Step 2: Verify billing is set up for compute usage\n\nGo to **Settings → Billing** and confirm your account has a valid **credit card** and is ready for compute billing. Hugging Face’s billing docs state that **compute services are usage-based** , **separate from PRO** , and **credit cards are the supported payment method** for compute services. (Hugging Face)\n\n### Why this matters\n\nYour PRO renewal and your inference spend are not the same billing stream. A paid PRO badge does not automatically prove that your account is ready to continue beyond the included monthly credits. (Hugging Face)\n\n## Step 3: Create a fresh token with the correct permission\n\nCreate a new **fine-grained token** and enable **Make calls to Inference Providers**. Hugging Face explicitly documents that this permission is required for Inference Providers requests. (Hugging Face)\n\nThen replace the token everywhere:\n\n * shell environment variables\n * notebook secrets\n * `.env` files\n * CI secrets\n * local cached login state. (Hugging Face)\n\n\n\n## Step 4: Retry with the token passed explicitly\n\nDo not rely on the client’s default token behavior during debugging. Hugging Face’s `InferenceClient` docs state that if you do not pass a token, it will use the **locally saved token** by default. (Hugging Face)\n\nA clean Python test looks like this:\n\n\n from huggingface_hub import InferenceClient\n\n client = InferenceClient(\n token=\"hf_your_new_token_here\"\n )\n\n resp = client.chat.completions.create(\n model=\"deepseek-ai/DeepSeek-V3-0324\",\n messages=[{\"role\": \"user\", \"content\": \"Reply with the word OK only.\"}],\n )\n\n print(resp.choices[0].message)\n\n\nThat example is using the documented `InferenceClient` flow and the recommended explicit token pattern. (Hugging Face)\n\n## Step 5: If you use an organization, explicitly bill the org\n\nIf you are supposed to use Team or Enterprise credits, set the billing target explicitly. Hugging Face’s docs show `bill_to=\"my-org-name\"` for the Python client and `X-HF-Bill-To: my-org-name` for HTTP requests. (Hugging Face)\n\nPython example:\n\n\n from huggingface_hub import InferenceClient\n\n client = InferenceClient(\n token=\"hf_your_token_here\",\n bill_to=\"my-org-name\"\n )\n\n resp = client.chat.completions.create(\n model=\"deepseek-ai/DeepSeek-V3-0324\",\n messages=[{\"role\": \"user\", \"content\": \"Reply with the word OK only.\"}],\n )\n\n print(resp.choices[0].message)\n\n\nRaw HTTP example:\n\n\n curl https://router.huggingface.co/v1/chat/completions \\\n -H \"Authorization: Bearer hf_your_token_here\" \\\n -H \"X-HF-Bill-To: my-org-name\" \\\n -H \"Content-Type: application/json\" \\\n -d '{\n \"model\": \"deepseek-ai/DeepSeek-V3-0324\",\n \"messages\": [{\"role\": \"user\", \"content\": \"Reply with the word OK only.\"}]\n }'\n\n\nThose patterns match Hugging Face’s current organization-billing documentation. (Hugging Face)\n\n## Step 6: If you need continued usage, choose the right billing path\n\nHugging Face documents two different ways to run inference:\n\n 1. **Routed by Hugging Face**\nYour request is routed through Hugging Face. Monthly credits apply. Extra usage is billed on your HF account. (Hugging Face)\n\n 2. **Custom provider key**\nYou supply your own provider key. Hugging Face routes the request, but the provider bills you directly. HF monthly credits do **not** apply. (Hugging Face)\n\n\n\n\n### When to stay with Hugging Face billing\n\nStay with HF billing if you want:\n\n * one bill\n * easy provider switching\n * to use the included monthly credits first. (Hugging Face)\n\n\n\n### When to switch to your own provider key\n\nSwitch if you:\n\n * already have an account with a provider\n * want more direct billing control\n * do not want the HF included-credit pool to be the limiting factor. (Hugging Face)\n\n\n\nHugging Face says you can set a **custom provider key** in Hub settings or in `InferenceClient`, while keeping the same integration surface. (Hugging Face)\n\n## Step 7: Consider avoiding HF-routed inference entirely\n\nIf your real goal is “I want inference to work without this credits system,” Hugging Face’s own inference guide says the client can also connect to **local endpoints** , including **llama.cpp** , **Ollama** , **vLLM** , **LiteLLM** , and **TGI**. That shifts you away from HF-routed Inference Providers billing. (Hugging Face)\n\nThis is often the cleanest long-term fix for users who want predictable local control rather than monthly hosted credits. (Hugging Face)\n\n* * *\n\n# A practical decision tree\n\n## Case A: You only use the Hugging Face website\n\nFollow this order:\n\n 1. Check **Inference Providers usage**. (Hugging Face)\n 2. Check **Billing** and confirm a valid compute payment method exists. (Hugging Face)\n 3. Remember that **widgets** , **Playground** , and **Data Studio AI** also spend the same credits. (Hugging Face)\n 4. If credits are gone, either purchase more capacity or switch to a custom provider key. (Hugging Face)\n\n\n\n## Case B: You use Python, JavaScript, LangChain, notebooks, or an OpenAI-compatible client\n\nFollow this order:\n\n 1. Generate a new **fine-grained** token with **Make calls to Inference Providers**. (Hugging Face)\n 2. Pass the token explicitly. Do not trust the cached local token. (Hugging Face)\n 3. If you use an org, add `bill_to` or `X-HF-Bill-To`. (Hugging Face)\n 4. If you still need more usage, choose between HF billing and a custom provider key. (Hugging Face)\n\n\n\n* * *\n\n# What I think is most likely for your case\n\nThe most probable explanation is:\n\n * your **PRO subscription is active**\n * but your **included Inference Providers credits are exhausted**\n * and either **pay-as-you-go compute billing is not fully usable yet** , or your **token / billing target is wrong**. (Hugging Face)\n\n\n\nThat diagnosis fits the official Hugging Face documentation and also matches the most common public support patterns around this exact error family. (Hugging Face)\n\n* * *\n\n# When to contact support\n\nContact Hugging Face billing support if all of the following are true:\n\n * your PRO subscription is active\n * the billing page shows a valid payment method\n * you created a fresh token with the correct permission\n * you passed the token explicitly\n * you set org billing correctly if relevant\n * and the error still persists. (Hugging Face)\n\n\n\nHugging Face’s billing docs explicitly direct billing-related support requests to **billing@huggingface.co**. (Hugging Face)\n\n* * *\n\n# Final takeaway\n\nThis error usually does **not** mean “your PRO subscription stopped working.” It usually means **Inference Providers credits or billing are the problem**. The cleanest recovery path is:\n\n 1. check usage\n 2. check compute billing\n 3. replace the token\n 4. pass the token explicitly\n 5. set org billing if needed\n 6. switch to a custom provider key or local endpoint if hosted HF credits are not the right fit. (Hugging Face)\n\n",
"title": "I can't use the service anymore"
}