External Publication
Visit Post

Dedicated CPU Inference Endpoint returns empty HTTP 500 after ~80s: is there a configurable request timeout?

Hugging Face Forums [Unofficial] April 16, 2026
Source

Thanks for the detailed investigation! Based on your findings, here are a few things to check that might resolve the 80s timeout:

  1. Adjust Request Timeout Settings :

    • In the huggingface_hub library, ensure the timeout parameter is set higher than 80s when initializing the InferenceApi client.
    • Example: InferenceApi(repo_id=..., timeout=120)
  2. Verify Container Resource Limits :

    • Although memory usage is low, confirm if the CPU cores allocated are sufficient for your workload. Sometimes CPU throttling can cause unexpected halts.
  3. Check Server-side Logs :

    • If possible, enable DEBUG level logs on the endpoint side to see if there is a silent exception being caught that isn’t visible in the standard 500 error message.

Hope this helps fix the empty 500 response issue!

Discussion in the ATmosphere

Loading comments...