Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicqgatpqoppbt2bztwgdd34crwiltrde4hy3jv7rf6ymly3dskhrm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mm4aat4jqad2"
  },
  "path": "/t/getting-quota-exceeded-even-though-requested-seconds-is-less-than-whats-left/176021#post_5",
  "publishedAt": "2026-05-18T04:24:01.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "@jc28735250",
    "Spaces ZeroGPU: Dynamic GPU Allocation for Spaces · Hugging Face",
    "[ZeroGPU] Blackwell update by cbensimon · Pull Request #2474 · huggingface/hub-docs · GitHub",
    "@spaces.GPU"
  ],
  "textContent": "@jc28735250\nThanks for the detailed reports, and sorry for the confusing behavior. Let me sort out what is going on, since a few different things are bundled together here.\n\n**1. The “2x consumption” you’re observing (this isn’t a bug)**\n\nWhat is most likely happening here is the new auto-fallback to `xlarge`. As part of the recent hardware migration, the GPUs backing ZeroGPU were changed (updated details are in the docs: Spaces ZeroGPU: Dynamic GPU Allocation for Spaces · Hugging Face). The per-GPU memory on the new hardware is smaller than before, so Spaces that no longer fit in `large` are automatically run on `xlarge`. Per the docs, `xlarge` has a 2x quota cost, but it also gives you 2x the GPU resources, so the higher quota cost generally corresponds to a faster wall-clock time per call. The exact speedup depends on the workload (compute-bound vs memory-bandwidth bound, whether the workload can fully utilize the larger GPU, etc.), so it is not always a clean 2x, but the extra quota is not pure overhead either.\n\nThis auto-fallback is not currently surfaced in the UI, which is the main reason it looks like everything is suddenly being counted twice. The progress bar going up to the reserved amount during a call and then settling to the actual usage afterward is the normal reserve-and-settle behavior; what changed is that both the reserved and settled values are now 2x compared to when the Space was running on `large`.\n\n**2. The “quota exceeded” popup (this part is a real bug, on the display side)**\n\nThe popup was showing the value of your `@spaces.GPU(duration=...)` argument as the “requested” number, instead of the duration the backend was actually reserving for the call. For an `xlarge`-promoted call, the actual reservation is ~2x the displayed `duration` value, which is why “90s requested vs. 129s left” still triggered the exceeded message.\n\nThe fix is already in internally and will go out with the next deploy. After the fix, the popup will display the actual backend reservation, which can be larger than the `duration=` value in your code.\n\n**3. On overall quota**\n\nAs part of this migration, the per-user ZeroGPU daily quota was also increased to help offset the hardware change. The updated values are in the docs linked above. For the before-and-after of the migration (hardware, VRAM, and quota numbers), the diff is in this PR: [ZeroGPU] Blackwell update by cbensimon · Pull Request #2474 · huggingface/hub-docs · GitHub.",
  "title": "Getting quota exceeded even though requested seconds is less than what's left"
}