External Publication
Visit Post

How to convert a single safetensors file to PEFT format

Hugging Face Forums [Unofficial] May 27, 2026
Source

Oh. The size drop may be because the conversion above does not include the MLP LoRA tensors:


LLM-generated notes / rough analysis:

I think the 850 MB -> 378 MB drop is probably explainable from the converter itself, and the most likely cause is not TextEncoder being skipped , but rather MLP LoRA tensors being skipped by default.

The relevant converter is this one:

OpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT

The script says:

# Attention-only by default (recommended). You can optionally include MLP keys with --include-mlp.
ALLOWED_QWEN_PREFIXES_ATTN = (
    "attn.to_q",
    "attn.to_k",
    "attn.to_v",
    "attn.to_out",
    "attn.add_q_proj",
    "attn.add_k_proj",
    "attn.add_v_proj",
    "attn.to_add_out",
)

# Optional MLP keys observed in Qwen-Image-Lightning (ComfyUI-style)
ALLOWED_QWEN_PREFIXES_MLP = (
    "img_mlp.net.0.proj",
    "img_mlp.net.2",
    "txt_mlp.net.0.proj",
    "txt_mlp.net.2",
)

And the actual filter is:

allowed_prefixes = ALLOWED_QWEN_PREFIXES_ATTN + (
    ALLOWED_QWEN_PREFIXES_MLP if include_mlp else ()
)

So, unless --include-mlp is passed, the converter keeps only the attention/projection LoRA tensors and drops:

img_mlp.net.0.proj
img_mlp.net.2
txt_mlp.net.0.proj
txt_mlp.net.2

This also matches the uploaded PEFT adapter’s adapter_config.json idea: the default target modules are attention/projection-ish modules, not MLP modules.

Relevant links:

  • PEFT upload / script: OpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT
  • Original 850 MB file: lightx2v/Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors
  • vLLM-Omni LoRA docs: LoRA - vLLM-Omni
  • Qwen-Image transformer implementation: qwen_image_transformer.py

Why the file size matches attention-only almost exactly

From the vLLM-Omni Qwen-Image transformer implementation, the default model shape is roughly:

num_layers = 60
num_attention_heads = 24
attention_head_dim = 128
inner_dim = 24 * 128 = 3072

The uploaded LoRA seems to be rank 64 / bf16. bf16 is 2 bytes per element.

For one LoRA linear projection with shape 3072 -> 3072 and rank 64:

lora_A: 64 x 3072
lora_B: 3072 x 64

elements = 64*3072 + 3072*64
         = 393,216

bytes = 393,216 * 2
      = 786,432 bytes
      = 0.75 MiB

The default converter keeps 8 attention projections per block:

attn.to_q
attn.to_k
attn.to_v
attn.to_out
attn.add_q_proj
attn.add_k_proj
attn.add_v_proj
attn.to_add_out

So the size estimate is:

0.75 MiB * 8 projections * 60 blocks = 360 MiB

In decimal MB:

360 MiB = 377.5 MB

That is almost exactly the reported converted size, 378 MB.

So I think the converted adapter size is not mysterious: it is basically the theoretical size of:

60 blocks * 8 attention LoRA projections * rank 64 * bf16

Why the original 850 MB also matches attention + MLP

The original file is listed as 850 MB here:

Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors

The missing difference is:

850 MB - 378 MB ~= 472 MB

That also matches the expected MLP LoRA size.

Qwen-Image blocks contain both image-stream and text-stream MLPs:

img_mlp
txt_mlp

The converter explicitly recognizes these MLP keys:

img_mlp.net.0.proj
img_mlp.net.2
txt_mlp.net.0.proj
txt_mlp.net.2

Assuming a usual MLP expansion of 4x, the MLP hidden size is approximately:

inner_dim * 4 = 3072 * 4 = 12288

For one MLP LoRA linear 3072 -> 12288 or 12288 -> 3072, rank 64:

elements = 64*3072 + 12288*64
         = 983,040

bytes = 983,040 * 2
      = 1,966,080 bytes
      = 1.875 MiB

There are 4 such MLP linears per block:

img_mlp.net.0.proj
img_mlp.net.2
txt_mlp.net.0.proj
txt_mlp.net.2

So:

1.875 MiB * 4 * 60 = 450 MiB

In decimal MB:

450 MiB = 471.9 MB

That is basically the whole missing part.

So the size arithmetic is:

attention LoRA only ~= 377.5 MB
MLP LoRA          ~= 471.9 MB
--------------------------------
total             ~= 849.4 MB

This is almost exactly the original 850 MB.

Therefore my rough conclusion is:

original 850 MB ~= attention LoRA + MLP LoRA
converted 378 MB ~= attention LoRA only

So is there information loss?

Probably yes, if the goal is to preserve the original LoRA exactly.

But it is a specific kind of information loss:

  • attention/projection LoRA is preserved
  • MLP LoRA is probably dropped
  • .alpha keys are skipped, but those are tiny and not the source of the size drop
  • TextEncoder is not needed to explain the size drop

I would not assume that this means the converted LoRA is useless. Attention-only LoRA can still have a strong effect, especially on rough prompt binding / layout / style direction. But for a Lightning/distillation LoRA, dropping the MLP part may reduce the low-step quality, details, texture, text rendering, and stability.

My guess:

simple prompts:      maybe fairly close
normal prompts:      likely usable, but weaker than full LoRA
complex text/layout: likely more visible degradation
4-step / 8-step edge cases: degradation likely more visible

Why TextEncoder is probably not the main explanation

TextEncoder skipping is possible in other LoRA conversion contexts, but here it is not necessary to explain the numbers.

The converter targets keys like:

transformer_blocks.N.<module>.lora_down.weight
transformer_blocks.N.<module>.lora_up.weight

It is not really written as a generic text_encoder / lora_te converter.

Also, the sizes line up too cleanly with:

attention-only = 378 MB
attention + MLP = 850 MB

So I would explain the size drop as MLP exclusion first, not TextEncoder exclusion.

Can we keep MLP?

Maybe. The script already has an option:

python comfyui-to-vllm-omni-qwenimage.py \
  --input Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors \
  --output ./out_adapter_with_mlp \
  --dtype bf16 \
  --base-model Qwen/Qwen-Image-2512 \
  --include-mlp

If this works as intended, I would expect adapter_model.safetensors to become close to 850 MB.

However, the converter itself warns that MLP can be tricky:

ap.add_argument(
    "--include-mlp",
    action="store_true",
    help="Also convert img_mlp/txt_mlp LoRA keys (may fail if vLLM expects different suffixes)",
)

The likely issue is not writing the tensors. Writing the tensors is easy. The issue is whether vLLM-Omni accepts and correctly applies the MLP module suffixes.

For example, the MLP targets include:

img_mlp.net.0.proj
img_mlp.net.2
txt_mlp.net.0.proj
txt_mlp.net.2

Their suffixes are roughly:

proj
2

proj is probably okay. The numeric suffix 2 may be the fragile part, because vLLM/vLLM-Omni LoRA validation can be strict about module suffixes. There is already a related vLLM issue for numeric-index module names such as to_out.0:

vLLM issue #35734: LoRA loading fails for modules with numeric indices

The current converter already works around the attention-side version of this by normalizing:

attn.to_out.0     -> attn.to_out
attn.to_add_out.0 -> attn.to_add_out

But net.2 is a different case. It may require the vLLM-Omni build to include "2" in expected LoRA modules, or it may need a more model-specific mapping.

Suggested sanity check

If anyone tries --include-mlp, I would check three things:

1. Size

ls -lh ./out_adapter_with_mlp/adapter_model.safetensors

Expected:

~850 MB

If it is still around 378 MB, MLP tensors were not included.

2. Key counts

from safetensors.torch import load_file

sd = load_file("./out_adapter_with_mlp/adapter_model.safetensors")

for needle in [
    "img_mlp.net.0.proj",
    "img_mlp.net.2",
    "txt_mlp.net.0.proj",
    "txt_mlp.net.2",
]:
    print(needle, sum(1 for k in sd if needle in k))

Expected rough count:

each MLP target: 60 blocks * 2 tensors = 120 keys

3. vLLM-Omni load log

The important question is whether vLLM-Omni reports that MLP modules were loaded and not silently ignored.

The vLLM-Omni LoRA docs require a PEFT-style adapter folder:

lora_adapter/
├── adapter_config.json
└── adapter_model.safetensors

Docs:

vLLM-Omni LoRA guide

If loading fails on net.2 / "2" / target module validation, then I think the clean solution would be either:

  1. patch the converter / adapter_config.json target modules, or
  2. patch vLLM-Omni’s diffusion LoRA mapper / supported modules for Qwen-Image MLP, or
  3. avoid runtime adapter loading and fuse the LoRA into the base model.

Practical recommendation

For runtime PEFT LoRA:

  1. Try the existing converter with --include-mlp.
  2. Confirm the output is around 850 MB.
  3. Confirm img_mlp / txt_mlp keys exist.
  4. Try loading in vLLM-Omni.
  5. If it fails, the likely blocker is target module suffix validation around net.2.

For maximum quality / minimum loader trouble:

  • fuse/merge the original LoRA into the Qwen-Image-2512 base weights using Diffusers or the reference loader
  • serve the fused model as a normal model in vLLM-Omni

That avoids the whole PEFT key validation problem, although it is no longer a runtime LoRA adapter.

TL;DR

I think the 378 MB file is probably an attention-only converted adapter.

The original 850 MB size is almost exactly:

attention LoRA ~= 378 MB
MLP LoRA       ~= 472 MB
total          ~= 850 MB

So the size drop is probably explained by the converter’s default behavior:

attention-only by default
MLP only if --include-mlp is passed

--include-mlp may preserve the missing tensors, but whether vLLM-Omni can load/apply img_mlp.net.2 and txt_mlp.net.2 correctly is the part that needs testing.

Discussion in the ATmosphere

Loading comments...