External Publication

Inference Endpoint Hanging in "Initializing"

Hugging Face Forums [Unofficial] March 31, 2026

Hello! I’ve been running into a problem with the inference endpoints and I was wondering if anyone has any suggestions.

Problem Description

I’ve been had three models deployed on Inference Endpoints for about a year now. They are all using A100 GPUs hosted on AWS and scale to zero when not in use. In the past week I’ve had the endpoints not get past the initialization step without manual intervention. This problem is intermittent.

I know that the scaling up means that the endpoints need to wait until AWS hardware is available, but the issue is that the endpoints sometimes never seem to get past initialization on their own. I need to manually go in, force scale to zero, then start them back up. After the manual intervention they always get past initialization which makes me think this issue is not just waiting for hardware to be available.

At first we saw this just once in a 6 month period, then suddenly it happens multiple times a day.

Additional Information

Logs

No logs are generated, the endpoint hangs without seemingly even entering the inference engine container.

System Configuration

Models: LLama 3.1 8B custom varient GPU: A100 Vendor: AWS Inference Engine Container: hommayushi3/vllm-huggingface Scaling: Scale to zero when not in use

Problem Description

Additional Information

Logs

System Configuration

Discussion in the ATmosphere