Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreid6vdxpjem3m4vnozi6qc55f3j5sit456hdp72x267fgyevlqpkwm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mhheg3hzcxb2"
  },
  "path": "/t/401-unauthorized-on-inference-providers-router-chat-completions-token-works-for-v1-models/174329#post_2",
  "publishedAt": "2026-03-20T00:11:34.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Hugging Face",
    "Hugging Face Forums",
    "GitHub"
  ],
  "textContent": "Account issues are not typically handled on the forum, so you’ll need to email support (website@huggingface.co) first.\n\nThat said, this symptom doesn’t seem to be an account issue. Maybe.\n\n* * *\n\nMy view is that this is **probably not** a hidden chat-completions restriction on your account.\n\nThe current Hugging Face docs do **not** describe a separate chat-only entitlement beyond the normal routed-inference requirements: a Hugging Face account, a fine-grained token with **“Make calls to Inference Providers”** , **remaining Inference Providers credits** , and a model that is actually available for chat on a provider-backed route. (Hugging Face)\n\n## What your current evidence means\n\nThe important distinction is this:\n\n  * `GET /v1/models` is a **discovery** call.\n  * `POST /v1/chat/completions` is an **execution** call.\n\n\n\nA successful models-list response shows that your token is present and acceptable enough for the router to return catalog information. It does **not** prove that the router can execute a paid, provider-backed chat request for your chosen model and provider path. Hugging Face’s own docs explicitly list **remaining credits** as a prerequisite for chat-style routed usage. (Hugging Face)\n\nSo this combination:\n\n  * `/v1/models` = 200\n  * `/v1/chat/completions` = 401\n\n\n\nis entirely possible without any special account ban.\n\n## Why your specific setup looks risky\n\nThis is the strongest clue in your case:\n\n  * you used `Qwen/Qwen3.5-9B:preferred`\n  * you set **HF Inference** as your preferred provider\n\n\n\nHugging Face’s current Inference Providers docs say that:\n\n  * the default router behavior is effectively `:fastest`\n  * `:preferred` follows your provider preference order\n  * you can also pin a provider explicitly with `model:provider` syntax. (Hugging Face)\n\n\n\nSeparately, the current Supported Models catalog shows **`Qwen/Qwen3.5-9B` with `together`** as the provider listing. (Hugging Face)\n\nThat means your test is not a neutral “does Qwen work on the router?” test. It is a more specific test:\n\n> “Does this model work when I force routing to follow my preferred-provider policy, where HF Inference is ranked first?”\n\nThat is a weaker test, because it mixes model choice with provider policy.\n\n## The most likely explanations, ranked\n\n### 1. Provider-selection mismatch\n\nThis is my top diagnosis.\n\nYour model appears in the current catalog with **Together**. Your request uses `:preferred`, and your settings prefer **HF Inference**. That makes it plausible that you are steering the router toward a provider/model path that is not the natural or supported chat backend for that model. (Hugging Face)\n\nThis is also consistent with how Inference Providers works now. It is not the old “single generic inference backend.” It is provider-aware and task-aware. (Hugging Face)\n\n### 2. Credits or billing problem with misleading error text\n\nThis is the second most likely.\n\nThe docs say chat-style Inference Providers usage requires **remaining credits**. The pricing docs also explain that Team and Enterprise orgs can centralize billing with `X-HF-Bill-To`, and org admins can set spending limits or disable providers. (Hugging Face)\n\nThere is also a public forum case where the surfaced message included `{\"error\":\"Invalid username or password.\"}` even though the deeper issue involved payment. That means this error string is not always a literal diagnosis. (Hugging Face Forums)\n\n### 3. Runtime token mismatch\n\nStill possible, but less convincing than the first two.\n\nThere are public issues where `Invalid username or password` came from token or environment problems, including `whoami` failures and bad runtime token state. (GitHub)\n\nBut in your case, the same token already works for `/v1/models`, and you already regenerated the token after changing settings. That weakens the “purely bad token” theory.\n\n### 4. Account-side backend bug\n\nPossible, but lower probability.\n\nThere are multiple public reports of the same 401 body on router chat paths, including Qwen-related examples and HF Inference-backed routes. That means account- or router-side bugs do happen. (Hugging Face Forums)\n\nBut that is not the first conclusion I would jump to, because your current provider-policy choice already creates a cleaner explanation.\n\n## What I do **not** think is happening\n\nI do **not** think the best explanation is:\n\n> “Your account is blocked from chat completions, but allowed to use `/v1/models`.”\n\nI did not find official documentation for a separate hidden permission like that. The documented requirements are token scope, credits, and provider-backed model availability. (Hugging Face)\n\n## How to isolate it cleanly\n\nUse a staged test plan. Each step changes one variable only.\n\n### Step 1: Stop using `:preferred`\n\nThis is the first thing to change.\n\nUse either:\n\n  * `:fastest`, which follows the router’s default policy, or\n  * an explicit provider suffix.\n\n\n\nThe docs explicitly document both patterns. (Hugging Face)\n\n### Step 2: Test a known-good router chat example\n\nUse a model that Hugging Face itself uses in current router examples.\n\n\n    curl -i https://router.huggingface.co/v1/chat/completions \\\n      -H \"Authorization: Bearer $HF_TOKEN\" \\\n      -H \"Content-Type: application/json\" \\\n      -d '{\n        \"model\": \"openai/gpt-oss-120b:fastest\",\n        \"messages\": [{\"role\":\"user\",\"content\":\"Reply with exactly OK\"}],\n        \"max_tokens\": 8\n      }'\n\n\nHugging Face’s current docs use `openai/gpt-oss-120b` as a normal chat-completions example on the router. (Hugging Face)\n\nInterpretation:\n\n  * if this works, your account and token are probably fine for chat\n  * if this fails with the same 401, look harder at credits, billing, or account-side issues\n\n\n\n### Step 3: Test your Qwen model with an explicit provider\n\nBecause the Supported Models catalog currently shows `Qwen/Qwen3.5-9B` under Together, test that exact provider path.\n\n\n    curl -i https://router.huggingface.co/v1/chat/completions \\\n      -H \"Authorization: Bearer $HF_TOKEN\" \\\n      -H \"Content-Type: application/json\" \\\n      -d '{\n        \"model\": \"Qwen/Qwen3.5-9B:together\",\n        \"messages\": [{\"role\":\"user\",\"content\":\"Reply with exactly OK\"}],\n        \"max_tokens\": 8\n      }'\n\n\nIf this works but `Qwen/Qwen3.5-9B:preferred` fails, then the problem is almost certainly your provider preference path, not your account. (Hugging Face)\n\n### Step 4: Verify the exact token used in the failing runtime\n\nDo this inside the same environment that fails.\n\n\n    import os\n    from huggingface_hub import HfApi\n\n    token = os.environ[\"HF_TOKEN\"].strip()\n    print(\"suffix:\", token[-8:])\n    print(HfApi().whoami(token=token))\n\n\nThe token docs confirm that user access tokens are the normal bearer-token mechanism for Inference Providers. (Hugging Face)\n\nThis catches:\n\n  * old token still loaded\n  * whitespace or newline contamination\n  * Streamlit secrets not matching shell env\n  * wrong variable name used by the app\n\n\n\n### Step 5: Check credits and org billing\n\nIf you are on Team or Enterprise, or if billing should go to an org, test with the org billing header:\n\n\n    curl -i https://router.huggingface.co/v1/chat/completions \\\n      -H \"Authorization: Bearer $HF_TOKEN\" \\\n      -H \"X-HF-Bill-To: your-org-name\" \\\n      -H \"Content-Type: application/json\" \\\n      -d '{\n        \"model\": \"openai/gpt-oss-120b:fastest\",\n        \"messages\": [{\"role\":\"user\",\"content\":\"Reply with exactly OK\"}],\n        \"max_tokens\": 8\n      }'\n\n\nHF documents this billing mode directly. (Hugging Face)\n\nThis step matters if:\n\n  * your personal credits are empty\n  * your org should be paying\n  * your org admins limited providers or spending\n\n\n\n### Step 6: Compare router base path patterns\n\nUse only the documented router path:\n\n  * `POST https://router.huggingface.co/v1/chat/completions`\n\n\n\nwith the model in the JSON body.\n\nDo not mix this with older or provider-path-specific URL styles unless you have a reason. The current chat-completions docs show the OpenAI-style `/v1/chat/completions` route with the model passed in the request body. (Hugging Face)\n\n## How to read the outcomes\n\nHere is the diagnosis matrix.\n\n### Case A: `gpt-oss-120b:fastest` works, `Qwen/Qwen3.5-9B:preferred` fails\n\nThat strongly suggests **provider-policy mismatch**.\n\n### Case B: `gpt-oss-120b:fastest` works, `Qwen/Qwen3.5-9B:together` also works\n\nThat confirms your account is fine and `:preferred` was the problem.\n\n### Case C: both fail with the same 401\n\nThat points more toward:\n\n  * credits/billing\n  * runtime token mismatch\n  * account-side router issue\n\n\n\n### Case D: adding `X-HF-Bill-To` fixes it\n\nThat means the problem was billing path, not chat permissions.\n\n## When to suspect a real account problem\n\nSuspect a real account/backend issue only if **all** of these are true:\n\n  * a known-good router model like `openai/gpt-oss-120b:fastest` still fails\n  * an explicit-provider Qwen test still fails\n  * the exact runtime token passes a `whoami` check\n  * credits and billing path are confirmed\n  * the same minimal `curl` fails outside your app too\n\n\n\nAt that point, you have narrowed it to something HF support can actually investigate.\n\nThere are public examples of unusual account-wide 401 problems on Hugging Face, but they usually affect far more than just one chat-completions path. (Hugging Face Forums)\n\n## What to send support if needed\n\nIf you escalate, send:\n\n  * exact timestamp\n\n  * full response headers\n\n  * full response body\n\n  * `x-request-id` from the failing response\n\n  * your HF username\n\n  * whether billing is personal or org\n\n  * whether `X-HF-Bill-To` changes anything\n\n  * results of:\n\n    * `openai/gpt-oss-120b:fastest`\n    * `Qwen/Qwen3.5-9B:together`\n\n\n\nThat gives support enough to separate auth, billing, provider routing, and account-state issues. Public HF discussions around 401 troubleshooting repeatedly rely on request IDs and minimal reproductions. (Hugging Face Forums)\n\n## My actual conclusion for your case\n\nMost likely:\n\n  1. **not a hidden chat restriction**\n  2. **not a missing extra permission beyond the one you already enabled**\n  3. **most likely a provider-selection problem caused by`:preferred` + HF Inference**\n  4. **second most likely a credits/billing issue surfaced with a misleading auth message**\n\n\n\nA broader caution also applies: on platforms like HF, some failures can come from server-side behavior changes rather than a local code change alone.\n\nThe fastest isolating test is this pair:\n\n  * `openai/gpt-oss-120b:fastest`\n  * `Qwen/Qwen3.5-9B:together`\n\n",
  "title": "401 Unauthorized on Inference Providers router chat completions (token works for /v1/models)"
}