{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidzjhtqhqzo6una6pnirq44isui6g2zdn32pepageayhglgkinlni",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mndy4jndia62"
  },
  "path": "/t/time-charged-before-gpu-assignment-in-zerogpu-spaces/176472#post_2",
  "publishedAt": "2026-06-02T23:45:45.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "@hysts",
    "Free Account ZeroGPU Quota Issue",
    "Getting quota exceeded even though requested seconds is less than what’s left",
    "ZeroGPU docs",
    "gradio-app/gradio#13209",
    "gradio-app/gradio#13210",
    "Using ZeroGPU Spaces with the Clients",
    "Spaces as API endpoints",
    "Time charged before GPU assignment in zerogpu spaces",
    "huggingface_hub#3452",
    "Docker spaces stuck in building with empty logs — all Docker builds affected",
    "Stuck at space problem",
    "Space stuck in Paused state — 503 on restart and factory rebuild",
    "503 shown for website for some request, while server is running and requests are being served",
    "Docker Spaces returning {\"data\":[]} / proxy not forwarding requests",
    "Google Cloud Load Balancing: troubleshooting 5xx errors",
    "AWS Application Load Balancer troubleshooting",
    "Kubernetes Service 503 troubleshooting",
    "Hugging Face Status",
    "huggingface_hub#4085",
    "huggingface_hub#3266",
    "huggingface_hub#3868",
    "hf-xet on PyPI",
    "NVIDIA RTX PRO 6000 instead of H200 for ZeroGPU",
    "hub-docs PR #2474",
    "AWS: Navigating GPU challenges — cost optimizing AI workloads",
    "JLL Global Data Center Outlook",
    "Reuters: US AI boom faces electric shock",
    "OpenAI status: reduced ChatGPT availability for Free users due to limited capacity",
    "Anthropic status history",
    "Cloudflare outage on February 20, 2026",
    "@spaces.GPU"
  ],
  "textContent": "Hmm. Recently, I feel like we have been seeing a number of errors that look somewhat network-related, though not necessarily on the public Internet side. The tricky part is that the trigger and root cause are hard to pin down. @hysts\n\n* * *\n\nI do not think this is best explained as one single root cause.\n\nThe pattern looks more like several different **managed-service boundaries** producing similar user-visible symptoms:\n\n  * a build job does not start or does not produce logs;\n  * a Space is stuck in Paused / Building / Starting / 503;\n  * a server is running, but some requests do not reach it;\n  * a ZeroGPU job waits for a GPU, fails to obtain one, but still appears to consume quota;\n  * API/custom/frontend paths behave differently from the normal Hub page;\n  * large uploads/downloads stall in a Hub/Xet/CDN path;\n  * external API calls from a Space fail in a DNS/egress/policy-looking way.\n\n\n\nThose are all “network-ish” from the outside, but they are not necessarily the same layer.\n\n## Short version\n\nI would group the recent clues like this:\n\nBoundary | User-visible symptom | My current read\n---|---|---\nSpaces builder / scheduler / control-plane | Empty build logs, build never triggers, restart/factory rebuild 503, paused state | Stronger signal than I initially weighted.\nSpaces proxy / routing | App/server is running, but some external requests 503 or never reach the app | Distinct from an app exception.\nZeroGPU reservation / refund / quota settlement | GPU is never assigned, but requested time appears consumed | Still suspicious; separate from known `xlarge` cost behavior.\nZeroGPU request identity | Browser/API/custom frontend/Space-to-Space quota behavior differs | Partly known; `X-IP-Token` and auth path matter.\nHub transfer backend | Large upload/download stalls, Xet/HTTP fallback/cache/CDN weirdness | Adjacent, probably separate from Spaces.\nExternal egress / abuse-control | DNS/external API failures, Cloudflare/VPN/shared-IP/keepalive effects | Separate path, but also network-looking.\nBlackwell / ZeroGPU runtime churn | `sm_120`, CUDA wheel, FlashAttention/xFormers/Triton failures | Relevant background, not the main theory for most reports here.\nIndustry capacity pressure | More queueing, quotas, routing, transfer backends, abuse controls | Background context only; not proof of direct causality.\n\nSo I would not call this simply “ZeroGPU is broken” or “the public Internet is broken.”\n\nA more useful description might be:\n\n> Several user-visible failures appear network-related, but the evidence points to multiple service boundaries: Spaces build/control-plane, proxy/routing, ZeroGPU scheduling/accounting, request identity, Hub transfer backend, and external egress/security policy.\n\n## Known or partly explained pieces\n\nSome parts already seem publicly understood or at least partially handled.\n\nArea | Public signal | Why I would not treat it as unexplained\n---|---|---\nFree-user ZeroGPU run-count / quota-message confusion | In Free Account ZeroGPU Quota Issue, `hysts` explained that Free users had a request-count limit in addition to time quota, and that the error message was initially misleading. | This explains some “quota exceeded” reports, but not all quota/accounting symptoms.\n`xlarge` 2× quota behavior | In Getting quota exceeded even though requested seconds is less than what’s left, `hysts` explained the `xlarge` fallback / 2× quota cost and a display-side issue. | This is important, but it is different from “no GPU was assigned but time was consumed.”\nZeroGPU Blackwell-backed sizing | Current ZeroGPU docs describe RTX PRO 6000 Blackwell-backed `large` / `xlarge` behavior and quota implications. | Runtime context changed, but CUDA/kernel mismatch has a different symptom shape.\nCustom frontend / `gr.Server` quota identity | gradio-app/gradio#13209 describes missing ZeroGPU `x-ip-token` behavior in custom `gr.Server` frontends; gradio-app/gradio#13210 fixed it. | This is a known request-identity class, not necessarily a GPU allocator/accounting bug.\nSpace-to-Space / API identity | Gradio documents ZeroGPU client behavior and `X-IP-Token` forwarding in Using ZeroGPU Spaces with the Clients. HF also documents Spaces as API endpoints. | Browser path, API path, custom frontend path, and Space-to-Space path can behave differently.\n\nThis matters because it narrows the remaining question.\n\nThe remaining suspicious reports are not simply “any quota error” or “any ZeroGPU delay.” They are more specific.\n\n## Still suspicious: ZeroGPU time charged before assignment\n\nThe clearest current ZeroGPU-specific unresolved-looking report is:\n\n  * Time charged before GPU assignment in zerogpu spaces\n\n\n\nThe reported flow is roughly:\n\n\n    Waiting for a GPU to become available\n    → user GPU time is already being charged\n    → no GPU is obtained\n    → process ends with \"No GPU was available for you\"\n    → 60 seconds appear to be consumed anyway\n\n\nThat does **not** look like a normal CUDA/Blackwell/kernel issue.\n\nA Blackwell compatibility failure usually looks like:\n\n\n    sm_120\n    no kernel image is available\n    invalid device function\n    old CUDA wheel\n    old PyTorch wheel\n    FlashAttention / xFormers / Triton failure\n\n\nThe “time charged before assignment” symptom looks more like:\n\n\n    ZeroGPU scheduler\n    → duration reservation\n    → GPU allocator\n    → no worker assigned\n    → failure\n    → settlement/refund questionable\n\n\nI would keep this separate from already-explained `xlarge` quota cost.\n\nKnown/expected:\n\n  * `duration` matters for scheduling.\n  * `xlarge` costs more quota than `large`.\n  * remaining quota can affect queue behavior.\n  * authenticated vs unauthenticated paths can affect quota pool.\n\n\n\nStill suspicious:\n\n  * no GPU worker is assigned;\n  * user code may not even enter the `@spaces.GPU` function;\n  * the job ends with “No GPU was available”;\n  * the requested duration still appears consumed.\n\n\n\nIf that is accurate, the interesting question is not “why did the model fail?” It is:\n\n> Is a reserved duration fully refunded when the ZeroGPU allocator fails to assign a GPU worker?\n\n## Stronger than expected: Spaces builder / scheduler / control-plane symptoms\n\nI would give this category more weight than ZeroGPU-only explanations.\n\nThere are several reports where the issue seems to happen before the app can do anything useful.\n\nExamples:\n\n  * huggingface_hub#3452: Docker Space not triggering a build despite a valid Dockerfile and correct SDK; build logs empty.\n  * Docker spaces stuck in building with empty logs — all Docker builds affected: Docker SDK builds stuck with empty logs, while another SDK path works.\n  * Stuck at space problem: long build/container wait and little/no logging; a reply said a similar issue had already been reported internally and infra was working on it.\n  * Space stuck in Paused state — 503 on restart and factory rebuild: restart and factory rebuild both returning 503.\n\n\n\nThose do not sound like ordinary Python exceptions.\n\nThe rough shape is:\n\n\n    Repo / Space settings\n    → build scheduler\n    → builder\n    → runtime state machine\n    → logs\n    → app start\n\n\nIf logs are empty or the build never really starts, the failure is likely before the app’s Python code.\n\nThis is also why “just change requirements.txt” or “restart again” often feels random in these cases. The failure may be in the orchestration path rather than the application path.\n\n## Also suspicious: running server, request not reaching it\n\nAnother important class is:\n\n  * 503 shown for website for some request, while server is running and requests are being served\n  * Docker Spaces returning {\"data\":[]} / proxy not forwarding requests\n\n\n\nThe key phrase in the 503 report is that the request is **not reaching the server**.\n\nThat is a different failure class from:\n\n\n    request reaches app\n    → app raises exception\n    → app returns 500\n\n\nIt is closer to:\n\n\n    browser/API\n    → HF edge/proxy\n    → route/backend selection/health state\n    → container\n    → app\n\n\nIf the app is alive but failed requests never appear in the app’s access logs, the interesting layer is before user code.\n\nThis is not an exotic failure pattern in managed cloud systems. Load balancers and proxies can return 5xx before the backend application sees the request. General references:\n\n  * Google Cloud Load Balancing: troubleshooting 5xx errors\n  * AWS Application Load Balancer troubleshooting\n  * Kubernetes Service 503 troubleshooting\n\n\n\nThat does not prove the HF issue is the same mechanism. It only shows that “server process is alive” and “external proxy can route to it reliably” are separate facts.\n\n## Hub upload/download instability: probably separate, but relevant context\n\nI would not merge Hub upload/download instability with Spaces runtime issues.\n\nStill, it is relevant as an adjacent pattern: HF-facing managed transfer layers can also produce “network-looking” failures.\n\nExamples:\n\n  * HF status currently shows a resolved large-file download incident where downloads stalled/hung via US Central CDN endpoints, especially with XET protocol, due to a Google Cloud us-central1 infrastructure issue: Hugging Face Status.\n  * huggingface_hub#4085: large downloads via HF Hub stuck in Colab.\n  * huggingface_hub#3266: `HF_HUB_DISABLE_XET` reportedly not disabling Xet in one setup.\n  * huggingface_hub#3868: HTTP download path can be too large, requiring `hf_xet`.\n  * hf-xet on PyPI describes `hf-xet` as the transfer layer used by `huggingface_hub` for Xet storage, with chunk-based deduplication and local disk caching.\n\n\n\nDifferent path:\n\n\n    Spaces request path:\n    browser/API\n    → Spaces proxy\n    → runtime/container\n    → app\n\n    ZeroGPU path:\n    Gradio request\n    → ZeroGPU queue\n    → quota reservation\n    → GPU allocator\n    → worker\n    → settlement/refund\n\n    Hub transfer path:\n    huggingface_hub / hf CLI / datasets / transformers\n    → auth/account tier\n    → Xet or HTTP fallback\n    → CAS/range/chunking\n    → cache/filesystem\n    → CDN/cloud route\n\n\nSo I would mention Hub UL/DL only as a separate track.\n\nIt does **not** explain a Space restart 503 by itself. It does **not** explain ZeroGPU quota settlement by itself. But it supports a broader observation: several HF-facing systems now involve more managed transfer/routing/cache/quota layers than a simple “HTTP request to one server” mental model.\n\n## External egress and abuse-control: another separate network-looking class\n\nAnother separate class is outbound traffic from Spaces:\n\n\n    Space container\n    → DNS / egress policy / external API\n\n\nThis is different from inbound Spaces proxy routing.\n\nThere have been scattered reports around external APIs, DNS, Cloudflare-like traffic, keepalive behavior, and abuse-control decisions. The exact causes may differ case by case. But this is another example where the user sees “network failure” while the actual boundary may be:\n\n  * outbound DNS;\n  * egress policy;\n  * blocked or classified target domain;\n  * shared IP reputation;\n  * VPN / Cloudflare / Worker / bot-like traffic classification;\n  * abuse-handler / pause-state logic.\n\n\n\nSo I would keep a separate mental bucket for:\n\n\n    inbound route fails\n    outbound egress fails\n    build/control-plane fails\n    quota/scheduler fails\n    transfer backend fails\n    abuse/security state changes\n\n\nThey are not the same problem.\n\n## Blackwell / ZeroGPU runtime churn: useful background, not the center\n\nI would not make Blackwell the main theory here.\n\nThe reason is not only that the specific symptoms look different. It is also that the broader report pattern seems to involve more Spaces/control-plane/proxy/routing reports than unresolved ZeroGPU-only reports.\n\nStill, Blackwell is useful background.\n\nPublicly visible facts:\n\n  * Current ZeroGPU docs describe RTX PRO 6000 Blackwell-backed `large` and `xlarge` ZeroGPU sizes.\n  * NVIDIA RTX PRO 6000 instead of H200 for ZeroGPU discusses the runtime change and compatibility symptoms.\n  * hub-docs PR #2474 updated the ZeroGPU docs for Blackwell.\n\n\n\nThis is a large visible runtime-contract change.\n\nI cannot verify from the outside whether there was a coordinated internal backend rollout. But a runtime change of that size likely touches more than the displayed GPU label: supported PyTorch versions, GPU sizing, quota behavior, validation, allocation pools, and compatibility assumptions can all move.\n\nSo I would phrase it carefully:\n\n> Blackwell / ZeroGPU runtime churn is relevant context, not a proven root cause.\n\nIt may explain CUDA/kernel-shaped failures. It should not be used as a blanket explanation for builder logs, proxy routing, or quota refund behavior.\n\n## Broader industry context: capacity pressure may increase boundary-state failures\n\nThis is also background, not proof.\n\nThe AI infrastructure industry is under real pressure: GPU supply, data-center capacity, power, cooling, network/storage infrastructure, and cost.\n\nUseful public context:\n\n  * AWS: Navigating GPU challenges — cost optimizing AI workloads\n  * JLL Global Data Center Outlook\n  * Reuters: US AI boom faces electric shock\n  * OpenAI status: reduced ChatGPT availability for Free users due to limited capacity\n  * Anthropic status history\n  * Cloudflare outage on February 20, 2026\n\n\n\nThis does not mean “HF is failing because GPUs are scarce.”\n\nA safer interpretation is:\n\n\n    When compute, power, bandwidth, and cost pressure rise,\n    platforms tend to add or tighten:\n\n    - queueing\n    - quotas\n    - duration budgets\n    - request-count limits\n    - regional routing\n    - fallback pools\n    - transfer backends\n    - caching\n    - abuse controls\n    - plan-based capacity rules\n\n\nThose layers are often necessary. They also create more boundary states.\n\nA user might see:\n\n\n    No GPU available\n    Quota exceeded\n    503\n    Request did not reach server\n    Download stuck at 99%\n    Upload stalls\n    Space paused\n    Factory rebuild fails\n\n\nBut the underlying reason may be very different in each case.\n\n## What I would infer, cautiously\n\nMy current working hypothesis is:\n\n> There may be no single outage. There may be a cluster of boundary-state issues appearing around the same time.\n\nThe most likely buckets are:\n\nBucket | Confidence | Why\n---|---|---\nSpaces builder / scheduler / control-plane | High | Empty logs, stuck builds, restart/factory rebuild 503, paused-state reports.\nSpaces proxy / routing / backend selection | High | Reports that a server is running but some requests do not reach it.\nZeroGPU reservation / refund / settlement | Medium-high | “No GPU assigned but time consumed” is specific and not explained by normal duration behavior.\nZeroGPU request identity | Medium | Known `X-IP-Token` / auth path issues exist; some are fixed or documented.\nHub transfer backend | Medium | Public XET/CDN/GCP incident and several large-transfer reports exist, but this is separate from Spaces runtime.\nBlackwell runtime churn | Background | Important timing/context, but most suspicious reports are not CUDA/kernel-shaped.\nIndustry resource pressure | Background | Makes quota/queue/routing/cache layers more plausible, but does not prove causality.\n\n## A cleaner way to talk about this\n\nInstead of saying:\n\n> HF network is broken.\n\nor:\n\n> ZeroGPU is broken.\n\nI would say something like:\n\n> Several recent reports look network-related from the outside, but the evidence points to different managed-service boundaries: Spaces build/control-plane, Spaces proxy/routing, ZeroGPU reservation/accounting, request identity, Hub transfer backend, and external egress/security policy. Some ZeroGPU pieces are already explained or fixed, while the remaining suspicious cases seem to involve reservation/refund and request-not-reaching-container behavior.\n\nThat keeps the claim narrow.\n\nIt also helps separate facts from speculation:\n\nFact-like observation | Interpretation\n---|---\nFree-user ZeroGPU request-count limit was explained by `hysts`. | Do not treat all quota errors as unexplained.\n`xlarge` 2× cost and misleading requested-time display were explained. | Do not confuse this with failed-GPU refund behavior.\n`gr.Server` custom frontend ZeroGPU identity issue had a Gradio fix. | Some API/custom frontend quota behavior is known.\nA report says no GPU was assigned but 60s consumed. | Possible reservation/refund/settlement issue.\nA report says server is running but request does not reach it. | Possible proxy/routing/backend-state issue.\nDocker build logs can be empty despite valid files. | Possible builder/scheduler/control-plane issue.\nHF status shows an XET/CDN/GCP large-file download incident. | Transfer path can fail below the user’s code.\nBlackwell migration changed the ZeroGPU runtime contract. | Background churn, not universal cause.\n\n## Bottom line\n\nI would focus less on reporting templates and more on this factual split:\n\n  1. **Already explained / partly fixed ZeroGPU issues**\n\n     * Free run-count vs quota message.\n     * `xlarge` quota cost / display mismatch.\n     * `gr.Server` / custom frontend `x-ip-token` path.\n  2. **Still suspicious ZeroGPU issue**\n\n     * GPU not assigned, but requested time appears consumed.\n     * This looks like reservation/refund/settlement, not CUDA.\n  3. **Broader Spaces issues**\n\n     * Empty build logs.\n     * Build not triggered.\n     * Paused state.\n     * Restart/factory rebuild 503.\n     * Server running but requests not reaching it.\n  4. **Separate transfer/egress context**\n\n     * Xet/CDN large-file transfer incidents.\n     * Upload/download stalls.\n     * External API/DNS/egress problems.\n  5. **Background pressure**\n\n     * Blackwell runtime churn.\n     * Industry-wide compute/network/power/cost pressure.\n     * More queueing, quota, routing, cache, transfer, and abuse-control layers.\n\n\n\nMy current guess is that the interesting part is not “one bug” but **where evidence disappears** :\n\n\n    build logs disappear\n    → builder / scheduler / control-plane\n\n    request logs disappear\n    → proxy / routing / backend selection\n\n    GPU function entry log disappears but quota changes\n    → ZeroGPU scheduler / reservation / settlement\n\n    download progress disappears\n    → Hub transfer / Xet / CDN / cache\n\n    external API DNS disappears\n    → egress / DNS / policy / abuse-control\n\n\nThat is probably the cleanest way to make the discussion useful without overclaiming.",
  "title": "Time charged before GPU assignment in zerogpu spaces"
}