External Publication
Visit Post

It seems use 60 sec GPU quota instead of real time usage?

Hugging Face Forums [Unofficial] April 10, 2026
Source

I only created 3 image, is the limit?

Yeah. If you are using a Free account and the Spaces duration is set to the default of 60 seconds , that is the correct limit. While the duration can be changed by the Spaces creator , many creators leave it at the default because setting it too short causes the entire generation process to fail, etc.

The reason why the count is based on reserved time rather than actual usage time is largely due to the mechanism of Zero GPU. (See below)

If you have accidentally incurred a financial loss and wish to request a refund, you must contact Hugging Face Support: billing@huggingface.co


Three images can be enough on HF ZeroGPU. The limit is not “number of images.” It is a time quota on shared GPU use, and the platform uses the function’s declared maximum runtime for scheduling, not only the exact wall-clock time you personally saw on screen. Hugging Face’s current docs say the default @spaces.GPU runtime is 60 seconds , a custom duration sets the maximum function runtime , and shorter durations improve queue priority. The same docs say free accounts get 3.5 minutes per day, unauthenticated users get 2 minutes , and PRO gets 25 minutes , with reset 24 hours after your first GPU usage. (Hugging Face)

The simplest explanation

ZeroGPU is a shared GPU pool. Hugging Face has to decide before your job starts whether your request can enter the queue fairly. That is why the docs talk about a maximum runtime and queue priority based on shorter durations. So the system is not thinking only in terms of “how long did this one image actually take after it finished?” It is also thinking “this request may occupy scarce GPU capacity for up to 60 seconds, or up to whatever duration the Space author set.” That is the background reason you keep seeing 60 seconds appear. (Hugging Face)

Why it feels wrong

It feels wrong because as a user you expect this:

  • image took maybe 8 seconds
  • so only 8 seconds should matter

But ZeroGPU behaves closer to this:

  • the Space asks for a GPU job with a budget
  • the default budget is 60 seconds unless the Space author lowered it
  • the scheduler checks whether that budget can be admitted
  • if your remaining quota is below that budget, the request can fail even if the real image might have finished faster

That reading is strongly supported by the current docs because Hugging Face explicitly says duration sets the maximum function runtime and that shorter durations improve queue priority. (Hugging Face)

What your bar values probably mean

Your numbers like 1.2 and 0.4 most likely represent minutes , not image count. This part is an inference, but it fits the published quotas very well:

  • 1.2 minutes72 seconds
  • 0.4 minutes24 seconds

If a Space is still using the default 60-second budget, then:

  • at 1.2 minutes left , a 60-second request could still fit
  • at 0.4 minutes left , a 60-second request would not fit

That exactly matches the kind of “I still have some bar left but it refuses to generate” behavior you described. The docs do not clearly document that bar UI in one place, but the quota numbers and 60-second default make this the best-fitting explanation. (Hugging Face)

Why only 3 images may already exhaust it

For a free account , the included ZeroGPU quota is 3.5 minutes total , which is 210 seconds. If the Space uses the default 60-second budget, then just 3 requests can already consume or reserve most of that budget:

  • 3 × 60s = 180 seconds
  • that leaves only 30 seconds
  • 30 seconds is only 0.5 minutes

So after 3 images, a bar like 0.4 or 0.5 left is completely plausible. This is especially true if the Space author did not lower duration for short jobs. (Hugging Face)

And it can get worse. If the Space requests size="xlarge", Hugging Face says it consumes 2× more daily quota than the default large. Their own example says a 45-second effective task duration on xlarge consumes 90 seconds of quota. In that kind of Space, only a few generations can burn through a free-tier day. (Hugging Face)

So is it “real time usage” or not?

The clean answer is: both matter, but in different ways. Hugging Face’s docs say the GPU is requested when the function is called and released when the function completes, which means the GPU is not supposed to stay occupied for the full 60 seconds if the work ends early. But the docs also say duration sets the maximum runtime and affects queue priority, which means the platform clearly uses that budget before execution for scheduling. (Hugging Face)

So the best mental model is:

  • actual runtime matters for the real work and GPU release
  • declared duration matters for queue admission and quota handling

That is why ZeroGPU can look like it is using “60 seconds instead of real time,” even though the real issue is that the system is built as a shared scheduler , not just a stopwatch. (Hugging Face)

Is there a hidden change on the site?

I did not find a current official Hugging Face doc announcing a recent change from “real runtime” to “duration-based quota.” The current docs still describe the same model: default 60 seconds , custom maximum runtime , dynamic duration support, quota tiers, and 24-hour reset. (Hugging Face)

So for your specific question, the most likely answer is not “a secret new rule.” The most likely answer is that this has been the design, but the UI makes it hard to understand. A strong sign of that confusion is that developers asked Hugging Face for a way to retrieve exact remaining ZeroGPU quota seconds, and that request was closed as not planned. That means observability is still weak. (GitHub)

Could it still be a bug?

Yes. But only in some cases.

There are real recent reports of wrong quota identity being applied. One Gradio issue from April 7, 2026 says custom frontends on HF Spaces with ZeroGPU could treat logged-in PRO users as unauthenticated , giving them only the smaller free-tier quota because the x-ip-token handshake was missing. Gradio’s changelog also includes fixes for ZeroGPU headers forwarding and updates to the ZeroGPU guide about manually passing an IP token. HF’s API docs likewise say authenticated requests consume your account quota, while unauthenticated requests use a stricter shared pool. (GitHub)

So:

  • if this happens in one or two Spaces only , it may be those Spaces’ setup
  • if it happens across many ZeroGPU Spaces , especially if you are logged in or PRO and still see nonsense like 0s left, then a bug or auth/identity mismatch is plausible (GitHub)

About the bar jumping from 0 to 1.2 to 0.4

That kind of jump is believable without any hidden site change. A few reasons:

  • quota is small on free tier: 3.5 minutes
  • different Spaces can request different durations
  • remaining quota affects queue behavior
  • the UI does not expose quota state very transparently
  • some auth/header issues can make a request count under the wrong quota pool (Hugging Face)

So the jumpy display does not by itself prove a new bug. It can happen from the combination of tiny quota, request budgets, and poor visibility.

Your “I only created 3 image” question

Yes, 3 images can be the limit. On free tier, very easily. If the Space uses the default 60-second budget, three generations already reach 180 seconds out of 210 seconds total free daily quota. If the Space is heavy, uses xlarge, or has longer custom duration, three images can be enough even faster. (Hugging Face)

The plain-language conclusion

HF ZeroGPU uses 60 seconds by default because it is running a shared GPU queue and needs a maximum runtime budget in advance to decide whether your job can be scheduled fairly. That is why it does not behave like a simple “count exact seconds after the image finishes” system. The confusing bar you see is most likely showing remaining time quota , probably in minutes , not remaining image count. So three images can absolutely be enough on the free tier, especially when the Space still uses the default 60-second duration. The current docs support this design, and I did not find an official announcement of a recent hidden rule change. Real bugs do exist, but they are more likely when quota looks wrong across many Spaces or when logged-in users are being treated as unauthenticated. (Hugging Face)

Discussion in the ATmosphere

Loading comments...