Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigjqyglvawgy232nxcgcbfigmqjgiylulfp7afcqys2p5cfqj5kjy",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mj4kzvry4sg2"
  },
  "path": "/t/it-seems-use-60-sec-gpu-quota-instead-of-real-time-usage/175130#post_4",
  "publishedAt": "2026-04-10T02:09:08.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Hugging Face",
    "GitHub",
    "@spaces.GPU"
  ],
  "textContent": "> I only created 3 image, is the limit?\n\nYeah. If you are using a Free account and the Spaces duration is set to **the default of 60 seconds** , that is the correct limit. While **the duration can be changed by the Spaces creator** , many creators leave it at the default because setting it too short causes the entire generation process to fail, etc.\n\nThe reason why the count is based on reserved time rather than actual usage time is largely due to the mechanism of Zero GPU. (See below)\n\nIf you have accidentally incurred a financial loss and wish to request a refund, you must contact Hugging Face Support: billing@huggingface.co\n\n* * *\n\n**Three images can be enough** on HF ZeroGPU. The limit is **not “number of images.”** It is a **time quota** on shared GPU use, and the platform uses the function’s **declared maximum runtime** for scheduling, not only the exact wall-clock time you personally saw on screen. Hugging Face’s current docs say the default `@spaces.GPU` runtime is **60 seconds** , a custom `duration` sets the **maximum function runtime** , and **shorter durations improve queue priority**. The same docs say free accounts get **3.5 minutes** per day, unauthenticated users get **2 minutes** , and PRO gets **25 minutes** , with reset **24 hours after your first GPU usage**. (Hugging Face)\n\n## The simplest explanation\n\nZeroGPU is a **shared GPU pool**. Hugging Face has to decide **before** your job starts whether your request can enter the queue fairly. That is why the docs talk about a **maximum runtime** and queue priority based on shorter durations. So the system is not thinking only in terms of “how long did this one image actually take after it finished?” It is also thinking “this request may occupy scarce GPU capacity for up to 60 seconds, or up to whatever `duration` the Space author set.” That is the background reason you keep seeing 60 seconds appear. (Hugging Face)\n\n## Why it feels wrong\n\nIt feels wrong because as a user you expect this:\n\n  * image took maybe 8 seconds\n  * so only 8 seconds should matter\n\n\n\nBut ZeroGPU behaves closer to this:\n\n  * the Space asks for a GPU job with a **budget**\n  * the default budget is **60 seconds** unless the Space author lowered it\n  * the scheduler checks whether that budget can be admitted\n  * if your remaining quota is below that budget, the request can fail even if the real image might have finished faster\n\n\n\nThat reading is strongly supported by the current docs because Hugging Face explicitly says `duration` sets the **maximum function runtime** and that **shorter durations improve queue priority**. (Hugging Face)\n\n## What your bar values probably mean\n\nYour numbers like **1.2** and **0.4** most likely represent **minutes** , not image count. This part is an inference, but it fits the published quotas very well:\n\n  * **1.2 minutes** ≈ **72 seconds**\n  * **0.4 minutes** ≈ **24 seconds**\n\n\n\nIf a Space is still using the default **60-second** budget, then:\n\n  * at **1.2 minutes left** , a 60-second request could still fit\n  * at **0.4 minutes left** , a 60-second request would **not** fit\n\n\n\nThat exactly matches the kind of “I still have some bar left but it refuses to generate” behavior you described. The docs do not clearly document that bar UI in one place, but the quota numbers and 60-second default make this the best-fitting explanation. (Hugging Face)\n\n## Why only 3 images may already exhaust it\n\nFor a **free account** , the included ZeroGPU quota is **3.5 minutes total** , which is **210 seconds**. If the Space uses the default 60-second budget, then just **3 requests** can already consume or reserve most of that budget:\n\n  * 3 × 60s = **180 seconds**\n  * that leaves only **30 seconds**\n  * 30 seconds is only **0.5 minutes**\n\n\n\nSo after 3 images, a bar like **0.4** or **0.5** left is completely plausible. This is especially true if the Space author did not lower `duration` for short jobs. (Hugging Face)\n\nAnd it can get worse. If the Space requests `size=\"xlarge\"`, Hugging Face says it consumes **2× more daily quota** than the default `large`. Their own example says a **45-second effective task duration** on `xlarge` consumes **90 seconds of quota**. In that kind of Space, only a few generations can burn through a free-tier day. (Hugging Face)\n\n## So is it “real time usage” or not?\n\nThe clean answer is: **both matter, but in different ways**. Hugging Face’s docs say the GPU is requested when the function is called and released when the function completes, which means the GPU is not supposed to stay occupied for the full 60 seconds if the work ends early. But the docs also say `duration` sets the **maximum** runtime and affects queue priority, which means the platform clearly uses that budget **before execution** for scheduling. (Hugging Face)\n\nSo the best mental model is:\n\n  * **actual runtime** matters for the real work and GPU release\n  * **declared duration** matters for queue admission and quota handling\n\n\n\nThat is why ZeroGPU can look like it is using “60 seconds instead of real time,” even though the real issue is that the system is built as a **shared scheduler** , not just a stopwatch. (Hugging Face)\n\n## Is there a hidden change on the site?\n\nI did **not** find a current official Hugging Face doc announcing a recent change from “real runtime” to “duration-based quota.” The current docs still describe the same model: default **60 seconds** , custom **maximum runtime** , dynamic duration support, quota tiers, and 24-hour reset. (Hugging Face)\n\nSo for your specific question, the most likely answer is **not** “a secret new rule.” The most likely answer is that this has been the design, but the UI makes it hard to understand. A strong sign of that confusion is that developers asked Hugging Face for a way to retrieve exact remaining ZeroGPU quota seconds, and that request was closed as **not planned**. That means observability is still weak. (GitHub)\n\n## Could it still be a bug?\n\nYes. But only in some cases.\n\nThere are real recent reports of **wrong quota identity** being applied. One Gradio issue from **April 7, 2026** says custom frontends on HF Spaces with ZeroGPU could treat logged-in PRO users as **unauthenticated** , giving them only the smaller free-tier quota because the `x-ip-token` handshake was missing. Gradio’s changelog also includes fixes for **ZeroGPU headers forwarding** and updates to the ZeroGPU guide about **manually passing an IP token**. HF’s API docs likewise say authenticated requests consume **your** account quota, while unauthenticated requests use a stricter shared pool. (GitHub)\n\nSo:\n\n  * if this happens in **one or two Spaces only** , it may be those Spaces’ setup\n  * if it happens **across many ZeroGPU Spaces** , especially if you are logged in or PRO and still see nonsense like `0s left`, then a **bug or auth/identity mismatch** is plausible (GitHub)\n\n\n\n## About the bar jumping from 0 to 1.2 to 0.4\n\nThat kind of jump is believable without any hidden site change. A few reasons:\n\n  * quota is small on free tier: **3.5 minutes**\n  * different Spaces can request different durations\n  * remaining quota affects queue behavior\n  * the UI does not expose quota state very transparently\n  * some auth/header issues can make a request count under the wrong quota pool (Hugging Face)\n\n\n\nSo the jumpy display does **not** by itself prove a new bug. It can happen from the combination of tiny quota, request budgets, and poor visibility.\n\n## Your “I only created 3 image” question\n\n**Yes, 3 images can be the limit.** On free tier, very easily. If the Space uses the default **60-second** budget, three generations already reach **180 seconds** out of **210 seconds** total free daily quota. If the Space is heavy, uses `xlarge`, or has longer custom duration, three images can be enough even faster. (Hugging Face)\n\n## The plain-language conclusion\n\nHF ZeroGPU uses **60 seconds by default** because it is running a **shared GPU queue** and needs a **maximum runtime budget** in advance to decide whether your job can be scheduled fairly. That is why it does not behave like a simple “count exact seconds after the image finishes” system. The confusing bar you see is most likely showing **remaining time quota** , probably in **minutes** , not remaining image count. So **three images can absolutely be enough** on the free tier, especially when the Space still uses the default 60-second duration. The current docs support this design, and I did not find an official announcement of a recent hidden rule change. Real bugs do exist, but they are more likely when quota looks wrong **across many Spaces** or when logged-in users are being treated as unauthenticated. (Hugging Face)",
  "title": "It seems use 60 sec GPU quota instead of real time usage?"
}