External Publication

Llama 3.1 70B API access?

Hugging Face Forums [Unofficial] June 23, 2026

There’s been a lot of confusion around how Inference Providers are supposed to be used:

I don’t think “You have been granted access to this model” necessarily contradicts “Model not supported by provider featherless-ai”.

The short version is:

Check	What it means
“You have been granted access” on the model page	You have access to the gated model repo / weights / model page resources.
The browser widget works	Some provider/path available from the widget could run something for that page. It does not necessarily prove your third-party app is using the same provider, model id, task, token scope, or endpoint.
`Model not supported by provider featherless-ai`	The selected provider, here `featherless-ai`, may not currently expose the exact model id and task that your API call is asking for.

So I would first check the exact model id + provider + task combination before debugging the token or curl syntax too much.

The quickest first check is the model search page with the Inference Providers filter:

https://huggingface.co/models?inference_provider=all

Then search for the exact model id and, if needed, narrow the provider filter to Featherless. If the exact model/provider combination is not listed there, changing the curl call probably will not make that provider serve the model.

Also, one subtle point: meta-llama/Llama-3.1-70B and meta-llama/Llama-3.1-70B-Instruct are not interchangeable.

meta-llama/Llama-3.1-70B is the base/pretrained model.
meta-llama/Llama-3.1-70B-Instruct is the instruction-tuned/chat-oriented model.

If your third-party app is making chat-completion-style calls, I would first verify whether the Instruct variant is available through the provider you are trying to use, rather than assuming that access to the base repo means the provider can serve it through chat completions.

A practical order of checks would be:

Confirm the exact model id:
- meta-llama/Llama-3.1-70B
- or meta-llama/Llama-3.1-70B-Instruct
- or some provider-side alias such as meta-llama/Meta-Llama-3.1-70B-Instruct
Check whether that exact model is currently exposed through Inference Providers:
- UI: models with Inference Providers filter
- API: Hub API for Inference Providers
If you are explicitly forcing Featherless, try not forcing it:
- use provider="auto" in huggingface_hub, or
- remove the :featherless-ai suffix if you are using the OpenAI-compatible router model name.
If it works with auto but fails with featherless-ai, that suggests a provider-specific availability/mapping issue, not a general Llama access issue.

Check the local client version if you are using Python:

python -c "import huggingface_hub; print(huggingface_hub.__version__)"

Featherless’ HF integration post says to use huggingface_hub v0.33.0 or newer:

Featherless AI on Hugging Face Inference Providers

If you still get the error, the useful info to post back would be:
- exact model id
- exact endpoint URL
- whether you are using provider="featherless-ai", :featherless-ai, or auto
- full error message
- huggingface_hub version, if applicable
- whether the app is using chat completion, text generation, or an older Inference API endpoint

Why this error can happen even when model access is granted (click for more details) How to check the provider mapping more precisely (click for more details) Base model vs Instruct model (click for more details) Provider selection: auto vs forcing Featherless (click for more details) Client version can change the symptom (click for more details) Task mismatch examples (click for more details) Legacy endpoint check (click for more details) Minimal curl shape to compare (click for more details)

To summarize my guess: this is probably not “you do not have access to Llama” in the simple gated-repo sense. It is more likely one of these:

the exact base model id is not exposed by the selected provider,
the third-party app is forcing featherless-ai,
the app is calling the wrong task,
the local SDK is old,
or the provider mapping/catalog and the runtime availability are temporarily out of sync.

The first thing I would rule out is the non-fixable case: is the exact model id currently available through the provider you are forcing?

Discussion in the ATmosphere