Huggingface token usage for routed requests for a custom provider
One practical pattern is to separate token validation from spend limits.
For routed/provider-style requests, I would not rely only on the upstream token being valid. I would also keep:
a request id / inference id for every call
prompt and completion token logs where available
a short billing reconciliation window
a hard quota or budget on the task/provider token
That way a valid token can still be rate-limited or quota-limited before it turns into a pile of unbilled backend work.
I wrote down the setup pattern I use for coding-agent/provider routing here:
github.com
GitHub - alicekellings/cursor-cline-token-budget: Practical token budgeting and usage transparency...
Practical token budgeting and usage transparency setup for OpenAI-compatible coding agents.
Disclosure: I maintain Wappkit (https://api.wappkit.com), an OpenAI-compatible gateway with quotas and usage logs. The same idea is provider-agnostic: validate identity, log per request, and cap the token that can spend.
Discussion in the ATmosphere