{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreihpdirq4glec3urpc36vj5v6qwke55xhh23runq4nbbl6iihsk3uu",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmupn3xxtd32"
},
"path": "/t/how-to-convert-a-single-safetensors-file-to-peft-format/173103#post_12",
"publishedAt": "2026-05-27T22:28:49.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"OpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT",
"lightx2v/Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors",
"LoRA - vLLM-Omni",
"qwen_image_transformer.py",
"Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors",
"vLLM issue #35734: LoRA loading fails for modules with numeric indices",
"vLLM-Omni LoRA guide"
],
"textContent": "Oh. The size drop may be because the conversion above does not include the MLP LoRA tensors:\n\n* * *\n\nLLM-generated notes / rough analysis:\n\nI think the `850 MB -> 378 MB` drop is probably explainable from the converter itself, and the most likely cause is **not TextEncoder being skipped** , but rather **MLP LoRA tensors being skipped by default**.\n\nThe relevant converter is this one:\n\nOpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT\n\nThe script says:\n\n\n # Attention-only by default (recommended). You can optionally include MLP keys with --include-mlp.\n ALLOWED_QWEN_PREFIXES_ATTN = (\n \"attn.to_q\",\n \"attn.to_k\",\n \"attn.to_v\",\n \"attn.to_out\",\n \"attn.add_q_proj\",\n \"attn.add_k_proj\",\n \"attn.add_v_proj\",\n \"attn.to_add_out\",\n )\n\n # Optional MLP keys observed in Qwen-Image-Lightning (ComfyUI-style)\n ALLOWED_QWEN_PREFIXES_MLP = (\n \"img_mlp.net.0.proj\",\n \"img_mlp.net.2\",\n \"txt_mlp.net.0.proj\",\n \"txt_mlp.net.2\",\n )\n\n\nAnd the actual filter is:\n\n\n allowed_prefixes = ALLOWED_QWEN_PREFIXES_ATTN + (\n ALLOWED_QWEN_PREFIXES_MLP if include_mlp else ()\n )\n\n\nSo, unless `--include-mlp` is passed, the converter keeps only the attention/projection LoRA tensors and drops:\n\n\n img_mlp.net.0.proj\n img_mlp.net.2\n txt_mlp.net.0.proj\n txt_mlp.net.2\n\n\nThis also matches the uploaded PEFT adapter’s `adapter_config.json` idea: the default target modules are attention/projection-ish modules, not MLP modules.\n\nRelevant links:\n\n * PEFT upload / script: OpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT\n * Original 850 MB file: lightx2v/Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors\n * vLLM-Omni LoRA docs: LoRA - vLLM-Omni\n * Qwen-Image transformer implementation: qwen_image_transformer.py\n\n\n\n## Why the file size matches attention-only almost exactly\n\nFrom the vLLM-Omni Qwen-Image transformer implementation, the default model shape is roughly:\n\n\n num_layers = 60\n num_attention_heads = 24\n attention_head_dim = 128\n inner_dim = 24 * 128 = 3072\n\n\nThe uploaded LoRA seems to be rank 64 / bf16. bf16 is 2 bytes per element.\n\nFor one LoRA linear projection with shape `3072 -> 3072` and rank 64:\n\n\n lora_A: 64 x 3072\n lora_B: 3072 x 64\n\n elements = 64*3072 + 3072*64\n = 393,216\n\n bytes = 393,216 * 2\n = 786,432 bytes\n = 0.75 MiB\n\n\nThe default converter keeps 8 attention projections per block:\n\n\n attn.to_q\n attn.to_k\n attn.to_v\n attn.to_out\n attn.add_q_proj\n attn.add_k_proj\n attn.add_v_proj\n attn.to_add_out\n\n\nSo the size estimate is:\n\n\n 0.75 MiB * 8 projections * 60 blocks = 360 MiB\n\n\nIn decimal MB:\n\n\n 360 MiB = 377.5 MB\n\n\nThat is almost exactly the reported converted size, `378 MB`.\n\nSo I think the converted adapter size is not mysterious: it is basically the theoretical size of:\n\n\n 60 blocks * 8 attention LoRA projections * rank 64 * bf16\n\n\n## Why the original 850 MB also matches attention + MLP\n\nThe original file is listed as `850 MB` here:\n\nQwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors\n\nThe missing difference is:\n\n\n 850 MB - 378 MB ~= 472 MB\n\n\nThat also matches the expected MLP LoRA size.\n\nQwen-Image blocks contain both image-stream and text-stream MLPs:\n\n\n img_mlp\n txt_mlp\n\n\nThe converter explicitly recognizes these MLP keys:\n\n\n img_mlp.net.0.proj\n img_mlp.net.2\n txt_mlp.net.0.proj\n txt_mlp.net.2\n\n\nAssuming a usual MLP expansion of 4x, the MLP hidden size is approximately:\n\n\n inner_dim * 4 = 3072 * 4 = 12288\n\n\nFor one MLP LoRA linear `3072 -> 12288` or `12288 -> 3072`, rank 64:\n\n\n elements = 64*3072 + 12288*64\n = 983,040\n\n bytes = 983,040 * 2\n = 1,966,080 bytes\n = 1.875 MiB\n\n\nThere are 4 such MLP linears per block:\n\n\n img_mlp.net.0.proj\n img_mlp.net.2\n txt_mlp.net.0.proj\n txt_mlp.net.2\n\n\nSo:\n\n\n 1.875 MiB * 4 * 60 = 450 MiB\n\n\nIn decimal MB:\n\n\n 450 MiB = 471.9 MB\n\n\nThat is basically the whole missing part.\n\nSo the size arithmetic is:\n\n\n attention LoRA only ~= 377.5 MB\n MLP LoRA ~= 471.9 MB\n --------------------------------\n total ~= 849.4 MB\n\n\nThis is almost exactly the original `850 MB`.\n\nTherefore my rough conclusion is:\n\n\n original 850 MB ~= attention LoRA + MLP LoRA\n converted 378 MB ~= attention LoRA only\n\n\n## So is there information loss?\n\nProbably yes, if the goal is to preserve the original LoRA exactly.\n\nBut it is a specific kind of information loss:\n\n * attention/projection LoRA is preserved\n * MLP LoRA is probably dropped\n * `.alpha` keys are skipped, but those are tiny and not the source of the size drop\n * TextEncoder is not needed to explain the size drop\n\n\n\nI would not assume that this means the converted LoRA is useless. Attention-only LoRA can still have a strong effect, especially on rough prompt binding / layout / style direction. But for a Lightning/distillation LoRA, dropping the MLP part may reduce the low-step quality, details, texture, text rendering, and stability.\n\nMy guess:\n\n\n simple prompts: maybe fairly close\n normal prompts: likely usable, but weaker than full LoRA\n complex text/layout: likely more visible degradation\n 4-step / 8-step edge cases: degradation likely more visible\n\n\n## Why TextEncoder is probably not the main explanation\n\nTextEncoder skipping is possible in other LoRA conversion contexts, but here it is not necessary to explain the numbers.\n\nThe converter targets keys like:\n\n\n transformer_blocks.N.<module>.lora_down.weight\n transformer_blocks.N.<module>.lora_up.weight\n\n\nIt is not really written as a generic `text_encoder` / `lora_te` converter.\n\nAlso, the sizes line up too cleanly with:\n\n\n attention-only = 378 MB\n attention + MLP = 850 MB\n\n\nSo I would explain the size drop as MLP exclusion first, not TextEncoder exclusion.\n\n## Can we keep MLP?\n\nMaybe. The script already has an option:\n\n\n python comfyui-to-vllm-omni-qwenimage.py \\\n --input Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors \\\n --output ./out_adapter_with_mlp \\\n --dtype bf16 \\\n --base-model Qwen/Qwen-Image-2512 \\\n --include-mlp\n\n\nIf this works as intended, I would expect `adapter_model.safetensors` to become close to `850 MB`.\n\nHowever, the converter itself warns that MLP can be tricky:\n\n\n ap.add_argument(\n \"--include-mlp\",\n action=\"store_true\",\n help=\"Also convert img_mlp/txt_mlp LoRA keys (may fail if vLLM expects different suffixes)\",\n )\n\n\nThe likely issue is not writing the tensors. Writing the tensors is easy. The issue is whether vLLM-Omni accepts and correctly applies the MLP module suffixes.\n\nFor example, the MLP targets include:\n\n\n img_mlp.net.0.proj\n img_mlp.net.2\n txt_mlp.net.0.proj\n txt_mlp.net.2\n\n\nTheir suffixes are roughly:\n\n\n proj\n 2\n\n\n`proj` is probably okay. The numeric suffix `2` may be the fragile part, because vLLM/vLLM-Omni LoRA validation can be strict about module suffixes. There is already a related vLLM issue for numeric-index module names such as `to_out.0`:\n\nvLLM issue #35734: LoRA loading fails for modules with numeric indices\n\nThe current converter already works around the attention-side version of this by normalizing:\n\n\n attn.to_out.0 -> attn.to_out\n attn.to_add_out.0 -> attn.to_add_out\n\n\nBut `net.2` is a different case. It may require the vLLM-Omni build to include `\"2\"` in expected LoRA modules, or it may need a more model-specific mapping.\n\n## Suggested sanity check\n\nIf anyone tries `--include-mlp`, I would check three things:\n\n### 1. Size\n\n\n ls -lh ./out_adapter_with_mlp/adapter_model.safetensors\n\n\nExpected:\n\n\n ~850 MB\n\n\nIf it is still around `378 MB`, MLP tensors were not included.\n\n### 2. Key counts\n\n\n from safetensors.torch import load_file\n\n sd = load_file(\"./out_adapter_with_mlp/adapter_model.safetensors\")\n\n for needle in [\n \"img_mlp.net.0.proj\",\n \"img_mlp.net.2\",\n \"txt_mlp.net.0.proj\",\n \"txt_mlp.net.2\",\n ]:\n print(needle, sum(1 for k in sd if needle in k))\n\n\nExpected rough count:\n\n\n each MLP target: 60 blocks * 2 tensors = 120 keys\n\n\n### 3. vLLM-Omni load log\n\nThe important question is whether vLLM-Omni reports that MLP modules were loaded and not silently ignored.\n\nThe vLLM-Omni LoRA docs require a PEFT-style adapter folder:\n\n\n lora_adapter/\n ├── adapter_config.json\n └── adapter_model.safetensors\n\n\nDocs:\n\nvLLM-Omni LoRA guide\n\nIf loading fails on `net.2` / `\"2\"` / target module validation, then I think the clean solution would be either:\n\n 1. patch the converter / `adapter_config.json` target modules, or\n 2. patch vLLM-Omni’s diffusion LoRA mapper / supported modules for Qwen-Image MLP, or\n 3. avoid runtime adapter loading and fuse the LoRA into the base model.\n\n\n\n## Practical recommendation\n\nFor runtime PEFT LoRA:\n\n 1. Try the existing converter with `--include-mlp`.\n 2. Confirm the output is around `850 MB`.\n 3. Confirm `img_mlp` / `txt_mlp` keys exist.\n 4. Try loading in vLLM-Omni.\n 5. If it fails, the likely blocker is target module suffix validation around `net.2`.\n\n\n\nFor maximum quality / minimum loader trouble:\n\n * fuse/merge the original LoRA into the Qwen-Image-2512 base weights using Diffusers or the reference loader\n * serve the fused model as a normal model in vLLM-Omni\n\n\n\nThat avoids the whole PEFT key validation problem, although it is no longer a runtime LoRA adapter.\n\n## TL;DR\n\nI think the 378 MB file is probably an attention-only converted adapter.\n\nThe original 850 MB size is almost exactly:\n\n\n attention LoRA ~= 378 MB\n MLP LoRA ~= 472 MB\n total ~= 850 MB\n\n\nSo the size drop is probably explained by the converter’s default behavior:\n\n\n attention-only by default\n MLP only if --include-mlp is passed\n\n\n`--include-mlp` may preserve the missing tensors, but whether vLLM-Omni can load/apply `img_mlp.net.2` and `txt_mlp.net.2` correctly is the part that needs testing.",
"title": "How to convert a single safetensors file to PEFT format"
}