Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihs573xfun25bdyhjkdu2oy4k7ogxjtn7fhndx7gozjjpui4jm54u",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjakl4i5ugx2"
  },
  "path": "/t/it-seems-use-60-sec-gpu-quota-instead-of-real-time-usage/175130#post_10",
  "publishedAt": "2026-04-11T09:20:43.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "click here for the detailed version",
    "Hugging Face",
    "Gradio",
    "Hugging Face Forums",
    "@spaces.GPU"
  ],
  "textContent": "I created a guide (click here for the detailed version):\n\n* * *\n\nThe reliable way is to treat this as **two separate jobs** :\n\n  1. **Duplicate the Space correctly**\n  2. **Recreate the runtime conditions that made the original work**\n\n\n\nThat matters because a duplicated Hugging Face Space is **private by default** , falls back to **free CPU by default unless you choose other hardware** , copies **public Variables** but **not Secrets** , and ZeroGPU itself is a separate runtime with its own rules. (Hugging Face)\n\n## What ZeroGPU is, before you duplicate\n\nZeroGPU is not “normal GPU Spaces, but free.” It is a **shared ZeroGPU runtime** for **Gradio Spaces only** , backed by **NVIDIA H200** capacity, with a default **60-second GPU duration** per `@spaces.GPU` call unless the app sets another duration. Hugging Face also notes that ZeroGPU can have **limited compatibility compared with standard GPU Spaces** , even though it supports Gradio 4+ and a wide range of PyTorch versions. (Hugging Face)\n\nThat is why a duplicated repo can be correct as code, but still fail as an app. The source Space may depend on the same SDK, the same hardware, the same secrets, the same startup behavior, and the same request path assumptions. (Hugging Face)\n\n## The safest way to duplicate a ZeroGPU Space to a private ZeroGPU Space\n\n### 1. Inspect the source Space first\n\nBefore clicking **Duplicate this Space** , check:\n\n  * whether it is really a **Gradio** Space\n  * whether the README/YAML pins `sdk_version` or `python_version`\n  * whether it pulls gated or private models\n  * whether it downloads large files at startup\n  * whether it clearly expects ZeroGPU behavior rather than ordinary GPU behavior\n\n\n\nThose checks matter because the README YAML controls important runtime settings like `python_version`, `sdk_version`, `startup_duration_timeout`, and `preload_from_hub`. (Hugging Face)\n\n### 2. Duplicate it as faithfully as possible\n\nFor the **first** duplicate, keep it conservative:\n\n  * set **Visibility = Private**\n  * keep the same **SDK** as the source\n  * if the source is ZeroGPU, choose **ZeroGPU**\n  * do **not** switch to CPU or paid GPU yet unless you are intentionally migrating\n\n\n\nThis matters because Hugging Face says duplicated Spaces default to **free CPU hardware** unless you choose otherwise. That is one of the easiest ways to create a “works there, broken here” duplicate. (Hugging Face)\n\n### 3. Recreate secrets immediately\n\nThis is the most common duplication miss.\n\nHugging Face documents that **Variables** can be auto-copied into duplicates, but **Secrets are not copied**. So if the original uses `HF_TOKEN`, third-party API keys, OAuth credentials, or anything private, you need to add them again in the duplicate’s Settings. (Hugging Face)\n\nIf the app needs access to a private or gated model, dataset, or other repo, use a Hugging Face access token with the needed permissions. Hugging Face documents **User Access Tokens** as the normal authentication method, and a **read** token is enough for read-only access to private repos you can read. (Hugging Face)\n\n### 4. First test from the standard HF Space page\n\nFor the first validation, open the duplicate from the **normal Hugging Face Space page while logged in**. Do not start with:\n\n  * direct `*.hf.space` URLs\n  * embeds\n  * custom frontends\n  * API clients\n\n\n\nGradio documents that ZeroGPU request accounting uses the `X-IP-Token` header. If that identity path is missing, the request may be treated as unauthenticated, which can make a perfectly fine duplicate look broken or quota-limited. (Gradio)\n\n### 5. Run the smallest realistic test first\n\nDo not start with the heaviest prompt, the biggest image, or the longest job. Use the smallest input that should still succeed.\n\nThat matters because ZeroGPU defaults to a **60-second duration budget** , and Hugging Face explicitly says shorter durations improve queue priority. It also explains why quota can feel “chunky” instead of matching only the wall-clock time you noticed. (Hugging Face)\n\n## The fast troubleshooting rule\n\nAfter duplication, do not debug randomly. First classify the failure:\n\n  * **stuck on`Building`**\n  * **`Running`, but behaves differently**\n  * **quota/auth looks wrong**\n  * **browser works, API fails**\n  * **ZeroGPU/CUDA error**\n\n\n\nThat classification-first approach is the fastest path because each bucket has a different likely cause and different next move. (Hugging Face Forums)\n\n## How to fix errors after duplication\n\n### A. Stuck on `Building`\n\n`Building` is not one single failure. The current Hugging Face forum guidance breaks it into multiple layers: repo/YAML read, build, scheduling, provisioning, then app health. (Hugging Face Forums)\n\nUse this order:\n\n  1. Check the Hugging Face status page and recent reports.\n  2. If the platform looks healthy, try **Restart** once.\n  3. Then try **Factory rebuild** once.\n  4. Only then inspect dependencies and startup config.\n\n\n\nThat order matches the forum guidance for recent `Building` failures. (Hugging Face Forums)\n\nIf logs are empty or show only queue-like behavior, suspect **platform or scheduler state first** , not your app code. If build logs show dependency failures, suspect **dependency drift** first. A fresh duplicate rebuilds now, under current conditions, which may differ from the environment that the source Space originally built under. (Hugging Face Forums)\n\nIf build finishes but the Space never becomes healthy, check README/YAML settings such as:\n\n  * `startup_duration_timeout`\n  * `preload_from_hub`\n\n\n\nHugging Face says `startup_duration_timeout` defaults to **30 minutes** , and `preload_from_hub` shifts large Hub downloads into build time so startup is faster and less fragile. (Hugging Face)\n\n### B. `Running`, but not like the original\n\nWhen the duplicate reaches `Running` but behaves differently, the usual cause is **environment drift** , not a broken duplicate button.\n\nA recent Hugging Face forum case showed a duplicate on the same ZeroGPU class behaving differently from the original until dependencies were pinned more tightly. (Hugging Face Forums)\n\nCheck in this order:\n\n  * smallest possible input\n  * one **Factory rebuild**\n  * `requirements.txt`\n  * `sdk_version`\n  * `python_version`\n  * whether hardware really matches\n  * whether a secret or access token is missing\n\n\n\nThis is the right place to be suspicious of version drift. “Same repo” does not guarantee “same resolved environment.” (Hugging Face Forums)\n\n### C. Quota exceeded, PRO ignored, or quota looks wrong\n\nDo not assume this is real quota exhaustion.\n\nGradio documents that ZeroGPU uses `X-IP-Token` for request identity, and there is also a recent GitHub issue showing that **custom`gr.Server` frontends** can miss the handshake and cause logged-in PRO users to be treated like unauthenticated users. (Gradio)\n\nUse this order:\n\n  1. test from the normal HF Space page while logged in\n  2. avoid direct `*.hf.space` links at first\n  3. avoid custom frontends at first\n  4. check whether the Space is on an old Gradio version\n\n\n\nThat last point matters because a Hugging Face forum reply specifically says a broader quota-related bug was resolved in **Gradio 5.12.0 or newer**. (Hugging Face Forums)\n\nFor background, current Hugging Face docs say ZeroGPU daily quota is **2 minutes** for unauthenticated users, **3.5 minutes** for free accounts, **25 minutes** for PRO, and resets **24 hours after first GPU usage**. (Hugging Face)\n\n### D. Browser works, API fails\n\nIf the private duplicate works in the browser but API calls fail, suspect **auth first**.\n\nHugging Face documents that every Gradio Space can be used as an API endpoint, and the standard programmatic path is the Gradio client with a token. (Hugging Face)\n\nFor a **private** duplicate:\n\n  * confirm the Space works in the browser first\n  * then test with an authenticated token\n  * use a Hugging Face access token with the needed read access for private resources\n\n\n\nThat separates “the app is broken” from “the app is fine, but your API request is not authorized.” (Hugging Face)\n\nFor ZeroGPU, there is a second layer: API access and ZeroGPU request identity are related but not identical. You can be authorized to access the private Space and still have a bad `X-IP-Token` path for ZeroGPU accounting. (Hugging Face)\n\n### E. `CUDA has been initialized before importing the spaces package`\n\nThis is a classic ZeroGPU-specific error.\n\nThe usual meaning is: something touched CUDA too early, before ZeroGPU could manage GPU allocation the way it expects. Hugging Face’s ZeroGPU docs say the intended pattern is:\n\n  * select ZeroGPU hardware\n  * `import spaces`\n  * put GPU work behind `@spaces.GPU`\n\n\n\nA forum thread with that exact error confirms this pattern in practice. (Hugging Face)\n\nWhat to check:\n\n  * `torch.cuda.is_available()` at import time\n  * `model.to(\"cuda\")` too early\n  * any CUDA-touching library side effects before `import spaces`\n\n\n\nThe fix is to move GPU work into the ZeroGPU-managed path instead of letting CUDA initialize too early. (Hugging Face)\n\n### F. `No CUDA GPUs are available`\n\nThis error can be either:\n\n  * a transient ZeroGPU/platform problem\n  * an app/runtime mismatch\n\n\n\nA Hugging Face forum thread shows this exact error on ZeroGPU, and a follow-up reply reported that a retry/replication later worked again, which suggests at least some cases are transient. (Hugging Face Forums)\n\nUse this order:\n\n  1. retry once\n  2. restart once\n  3. if it clears, do not rewrite code yet\n  4. if it persists only in your duplicate, inspect CUDA timing and dependency pins\n\n\n\nThat keeps you from wasting time on a transient platform issue. (Hugging Face Forums)\n\n### G. `ZeroGPU worker error RuntimeError`\n\nTreat this as a **symptom bucket** , not a diagnosis.\n\nForum reports show that this class of error can be caused by broader platform issues, by temporary ZeroGPU instability, or by app-specific dependency problems. (Hugging Face Forums)\n\nUse this order:\n\n  1. retry once\n  2. restart once\n  3. see whether many unrelated ZeroGPU Spaces are failing too\n  4. if only your duplicate fails, inspect versions and rebuild state\n\n\n\nIf many Spaces fail at the same time, suspect platform conditions. If only your duplicate fails, suspect runtime drift first. (Hugging Face Forums)\n\n### H. `ZeroGPU illegal duration` or “requested GPU duration is larger than the maximum allowed”\n\nThis usually means the app requested an unrealistic GPU duration, not that duplication failed.\n\nHugging Face documents the default duration as **60 seconds** and shows custom duration examples like `@spaces.GPU(duration=120)`. A forum thread shows “300s” triggering the illegal-duration error. (Hugging Face)\n\nWhat to do:\n\n  * find `@spaces.GPU(duration=...)`\n  * lower it\n  * retest with a smaller workload\n  * keep GPU sections narrow and only as long as needed\n\n\n\nAlso note that `xlarge` consumes **2×** the daily quota of `large`, so using a bigger ZeroGPU size can make quota pressure worse, not better. (Hugging Face)\n\n## When to move to paid GPU\n\nMove to paid GPU **after** you get one clean minimal success on a faithful private ZeroGPU duplicate.\n\nThat is the safest point to migrate because then you know the code, secrets, and startup path are basically correct. The migration is no longer mixed up with duplication mistakes. The recent forum thread about “ZeroGPU to paid hardware” is really a migration problem, not a plain duplication problem. (Hugging Face Forums)\n\n## The short version\n\nUse this order:\n\n  1. confirm the source is a **Gradio ZeroGPU** Space\n  2. duplicate it as **Private + same SDK + same ZeroGPU class**\n  3. recreate **Secrets** and any needed `HF_TOKEN`\n  4. test from the **standard HF Space page while logged in**\n  5. run the **smallest** input first\n  6. classify the first failure instead of changing many things at once\n  7. only after one success, optimize startup or migrate to paid GPU\n\n\n\nThat is the cleanest beginner-safe workflow because it separates **repo duplication** from **runtime reconstruction**. (Hugging Face)\n\nThe most useful references to keep open while doing this are the official **Spaces Overview** , **ZeroGPU** , **Spaces Configuration Reference** , **Spaces as API endpoints** , the Gradio **Using ZeroGPU Spaces with the Clients** guide, and the recent Hugging Face forum threads on **Building** , **quota** , and **duplicate behaves differently**. (Hugging Face)",
  "title": "It seems use 60 sec GPU quota instead of real time usage?"
}