External Publication
Visit Post

CPU offloading error scenario

Hugging Face Forums [Unofficial] April 24, 2026
Source

I have a problem I am trying to offload the vision and audio to CPU. Via a device map. But I get below error. If I change to device_map={“”:0} then everything works correct. Just checking if somebody can reproduce it and need if I need to change anything. Or if I need to create a issue at PEFT, Transformers, Accelerate or BitandBytes.

If I understand the problem correct I am offloading model.vision_tower, model.multi_modal_projector and model.audio_tower. But PEFT expects those to be on GPU. Can’t I offload on PEFT as well?

Thanks in advance to all

Model: Gemma 4 E4B IT

Device Map: device_map = { “model.vision_tower”: “cpu”, “model.multi_modal_projector”: “cpu”, “model.audio_tower”: “cpu”, “”: 0 # This sets the rest of the model to GPU 0 }

Bits and Bytes Config:

    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    llm_int8_enable_fp32_cpu_offload=True,

Model Kwargs:

“dtype”: torch.bfloat16, “attn_implementation”: “sdpa”, “trust_remote_code”: False, “low_cpu_mem_usage”: False

Loading Code:

base_model = Gemma4ForConditionalGeneration.from_pretrained( MODEL_REGISTRY[model_id_to_load], quantization_config=quant_config, device_map=device_map, max_memory=max_memory, offload_folder=“e:\Folder\offload_temp”, **MODEL_KWARGS[model_id_to_load] )

PEFT loading:

    if isinstance(base_model, PeftModel):
        base_model = base_model.merge_and_unload()

    model = PeftModel.from_pretrained(
        base_model,
        lora_path,
        adapter_name=lora_source_client_name,
        is_trainable=False
    )

Versions: Transformers: v5.6.2 Accelerate: v1.14.0.dev0 BitsandBytes: v0.49.2 Torch: 2.8.0+cu129 Peft: v0.19.1

Below Error happens after PEFT Loading:

2026-04-24 11:35:10,080 | Worker (6904) | INFO | Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:

  • 0: 5668601858 bytes required These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config. 2026-04-24 11:35:12,470 | Worker (6904) | ERROR | Worker error: Tensor.item() cannot be called on meta tensors Traceback (most recent call last): File “E:\Folder\inference_worker.py”, line 414, in inference_worker_loop model = _worker_load_model(model_id_to_load, lora_source_client_name, supports_image, supports_audio) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “E:\Folder\inference_worker.py”, line 344, in _worker_load_model model = PeftModel.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File “E:\Folder\gemma_env\Lib\site-packages\peft\peft_model.py”, line 582, in from_pretrained load_result = model.load_adapter( ^^^^^^^^^^^^^^^^^^^ File “E:\Folder\gemma_env\Lib\site-packages\peft\peft_model.py”, line 1475, in load_adapter dispatch_model( File “E:\Folder\gemma_env\Lib\site-packages\accelerate\big_modeling.py”, line 432, in dispatch_model attach_align_device_hook_on_blocks( File “E:\Folder\gemma_env\Lib\site-packages\accelerate\hooks.py”, line 695, in attach_align_device_hook_on_blocks attach_align_device_hook_on_blocks( File “E:\Folder\gemma_env\Lib\site-packages\accelerate\hooks.py”, line 695, in attach_align_device_hook_on_blocks attach_align_device_hook_on_blocks( File “E:\Folder\gemma_env\Lib\site-packages\accelerate\hooks.py”, line 695, in attach_align_device_hook_on_blocks attach_align_device_hook_on_blocks(

Previous line repeated 3 more times $$ $$

File “E:\Folder\gemma_env\Lib\site-packages\accelerate\hooks.py”, line 677, in attach_align_device_hook_on_blocks attach_execution_device_hook( File “E:\Folder\gemma_env\Lib\site-packages\accelerate\hooks.py”, line 470, in attach_execution_device_hook attach_execution_device_hook( File “E:\Folder\gemma_env\Lib\site-packages\accelerate\hooks.py”, line 459, in attach_execution_device_hook if not hasattr(module, “_hf_hook”) and len(module.state_dict()) > 0: ^^^^^^^^^^^^^^^^^^^ File “E:\Folder\gemma_env\Lib\site-packages\torch\nn\modules\module.py”, line 2260, in state_dict module.state_dict( File “E:\Folder\gemma_env\Lib\site-packages\torch\nn\modules\module.py”, line 2257, in state_dict self._save_to_state_dict(destination, prefix, keep_vars) File “E:\Folder\gemma_env\Lib\site-packages\bitsandbytes\nn\modules.py”, line 525, in _save_to_state_dict for k, v in self.weight.quant_state.as_dict(packed=True).items(): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “E:\Folder\gemma_env\Lib\site-packages\bitsandbytes\functional.py”, line 581, in as_dict “nested_offset”: self.offset.item(), ^^^^^^^^^^^^^^^^^^ File “E:\Folder\gemma_env\Lib\site-packages\torch_meta_registrations.py”, line 7457, in meta_local_scalar_dense raise RuntimeError(“Tensor.item() cannot be called on meta tensors”) RuntimeError: Tensor.item() cannot be called on meta tensors 2026-04-24 11:35:12,517 | Worker (15192) | ERROR | Worker returned error: Worker error: Tensor.item() cannot be called on meta tensors

Discussion in the ATmosphere

Loading comments...