CUDA Error 802 on every H200 multi-GPU HF Job, across three vLLM images

Hugging Face Forums [Unofficial] April 22, 2026

Source

Seems platform-side issue? LLM suggested:

This looks less like a pure vLLM bug and more like an H200 multi-GPU / NVSwitch / Fabric Manager issue on the HF side.

If I were debugging it, I’d probably try three things first:

see whether single-GPU on H200 works (CUDA_VISIBLE_DEVICES=0, tensor_parallel_size=1);
try VLLM_WORKER_MULTIPROC_METHOD=spawn, or use vllm serve / a normal script instead of python -c, since the vLLM multiprocessing docs explain that the startup path differs there;
check nvidia-smi -q | grep -A 2 Fabric, since both NVIDIA’s CUDA 802 guidance and DigitalOcean’s note on this exact error point at fabric / Fabric Manager on NVSwitch systems.

If single-GPU works but multi-GPU fails, or Fabric state looks wrong, this probably isn’t something a user can really fix from inside the job. The NVIDIA AI Enterprise docs say Fabric Manager is required for HGX 1/2/4/8-GPU VMs, and on H100/H200 shared NVSwitch setups that management lives on the host / service-VM side. That makes it sound much more like an HF infra issue than an application issue.

So I’d probably keep both the forum thread and the GitHub issue updated with:

job ID
image + flavor
what works vs what fails
result of the single-GPU test
Fabric output
whether spawn changes anything

There’s also a somewhat similar AWS EKS issue where vLLM hit the same CUDA 802 path and it ended up looking node / AMI-side rather than model-side.

Discussion in the ATmosphere