Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibg777xl75v5d45ig7nwmxpulm3mrqbru6xo4ozwfyemd6lv3naly",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mpqtmocvsxp2"
  },
  "path": "/t/inference-providers-3-cents-per-request/142396#post_7",
  "publishedAt": "2026-07-03T14:36:08.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I would debug this as a pricing-shape problem first, not only as a token-count problem. A one-word “hello” request is a bad sample because any fixed request overhead, provider minimum, image/VLM routing cost, or approximation heuristic will dominate the token price.\n\nThe checks I would run:\n\n  1. same model + same provider, text-only, no image input\n  2. very small max_tokens, then a larger realistic max_tokens\n  3. compare HF Inference Providers vs the provider directly for the same payload\n  4. log input tokens, output tokens, provider, model, and the charged amount for each call\n  5. repeat with a realistic prompt instead of “hello”\n\n\n\nFor production, I would avoid optimizing around cost per request alone. The more useful number is usually cost per completed workflow or accepted answer, because retries, longer context, images, and fallback calls can matter more than the visible user message size.",
  "title": "Inference Providers: 3 cents per request?"
}