HF ZeroGPU Space Hangs, No Output in the logs
Hi, thank you very much for your detailed troubleshooting guide! I was not even sure where to start with this.
1. SDK and Python versions
I have ensured that the environment is identical across the local and remote setups, including the python and Gradio versions.
2. Following the execution with prints
I tried adding the print statements with the pattern you suggested, and the execution indeed hangs before it prints “C: entered decorated function”. The print statement is the first line inside the @spaces.GPU decorated function, and it fails to execute, suggesting that the Gradio app is able to register the “Run Inference” click, but the switch to execution on the real GPU is failing.
3. Other Package versions
I have ensured all other package versions are identical on both setups:
python 3.10.13 (main, Mar 12 2024, 12:16:25) [GCC 12.2.0]
gradio 5.29.0
torch imported, cuda.is_available()=True
torch 2.7.0+cu128
transformers 4.48.1
huggingface_hub 0.36.0
torch build info: torch.__version__=2.7.0+cu128, torch.version.cuda=12.8, torch.backends.cudnn.version()=90701
I am actually pulling the built container from the HF space using the “Run locally” option, so the local environment uses the exact docker image from HF spaces.
4. ZeroGPU patterns
Thank you, I have verified that I am not using torch.compile and am importing the spaces package before any of the CUDA-related packages are imported.
Media components and custom frontends: I have removed the UI elements and reduced my app to a very simple app: there is just a single button that triggers the inference, with a text box to print the output. I can verify that the issue occurs even with this simplified setup, suggesting the issue is not related to the UI elements.
But : I am using a torchscript model (loaded using torch.jit.load) that sometimes does some optimizations the first time it infers, but I have disabled it by using the torch.jit.optimized_execution(False) context for that model’s inference. Do you think this could be a problem?
5. Observability
I have checked and ensured that I am seeing the full logs: I have the flush=True argument for all the print statements, and I am also writing to a persistent storage (a mounted bucket), and I see the same behavior.
6. Code structure
I can verify that the suggested code structure is being followed. I am loading all the models at the module level, there is no lazy-loading of models.
Based on your list of possible causes:
hot-path setup or hidden initialization inside the request : I think this is eliminated since the first
printline inside the decorated function fails to execute.runtime-version drift between local and Space : I have ensured the environments are identical.
ZeroGPU-incompatible import/runtime pattern : Most patterns are respected, except a torchscript model - could that be an issue?
logs hiding the true boundary : I think this is eliminated by the file-writes to the mounted bucket that reflect the same behavior as the logs.
frontend/header path issues : I think this is eliminated since even the simplified app with a single button experiences the same hang issue.
Do you have any suggestions on where I could be going wrong?
Thank you very much once again.
Discussion in the ATmosphere