Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihpdirq4glec3urpc36vj5v6qwke55xhh23runq4nbbl6iihsk3uu",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmupn3xxtd32"
  },
  "path": "/t/how-to-convert-a-single-safetensors-file-to-peft-format/173103#post_12",
  "publishedAt": "2026-05-27T22:28:49.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "OpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT",
    "lightx2v/Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors",
    "LoRA - vLLM-Omni",
    "qwen_image_transformer.py",
    "Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors",
    "vLLM issue #35734: LoRA loading fails for modules with numeric indices",
    "vLLM-Omni LoRA guide"
  ],
  "textContent": "Oh. The size drop may be because the conversion above does not include the MLP LoRA tensors:\n\n* * *\n\nLLM-generated notes / rough analysis:\n\nI think the `850 MB -> 378 MB` drop is probably explainable from the converter itself, and the most likely cause is **not TextEncoder being skipped** , but rather **MLP LoRA tensors being skipped by default**.\n\nThe relevant converter is this one:\n\nOpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT\n\nThe script says:\n\n\n    # Attention-only by default (recommended). You can optionally include MLP keys with --include-mlp.\n    ALLOWED_QWEN_PREFIXES_ATTN = (\n        \"attn.to_q\",\n        \"attn.to_k\",\n        \"attn.to_v\",\n        \"attn.to_out\",\n        \"attn.add_q_proj\",\n        \"attn.add_k_proj\",\n        \"attn.add_v_proj\",\n        \"attn.to_add_out\",\n    )\n\n    # Optional MLP keys observed in Qwen-Image-Lightning (ComfyUI-style)\n    ALLOWED_QWEN_PREFIXES_MLP = (\n        \"img_mlp.net.0.proj\",\n        \"img_mlp.net.2\",\n        \"txt_mlp.net.0.proj\",\n        \"txt_mlp.net.2\",\n    )\n\n\nAnd the actual filter is:\n\n\n    allowed_prefixes = ALLOWED_QWEN_PREFIXES_ATTN + (\n        ALLOWED_QWEN_PREFIXES_MLP if include_mlp else ()\n    )\n\n\nSo, unless `--include-mlp` is passed, the converter keeps only the attention/projection LoRA tensors and drops:\n\n\n    img_mlp.net.0.proj\n    img_mlp.net.2\n    txt_mlp.net.0.proj\n    txt_mlp.net.2\n\n\nThis also matches the uploaded PEFT adapter’s `adapter_config.json` idea: the default target modules are attention/projection-ish modules, not MLP modules.\n\nRelevant links:\n\n  * PEFT upload / script: OpenxAILabs/Qwen-Image-2512-Lightning-8steps-V1.0-bf16-PEFT\n  * Original 850 MB file: lightx2v/Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors\n  * vLLM-Omni LoRA docs: LoRA - vLLM-Omni\n  * Qwen-Image transformer implementation: qwen_image_transformer.py\n\n\n\n## Why the file size matches attention-only almost exactly\n\nFrom the vLLM-Omni Qwen-Image transformer implementation, the default model shape is roughly:\n\n\n    num_layers = 60\n    num_attention_heads = 24\n    attention_head_dim = 128\n    inner_dim = 24 * 128 = 3072\n\n\nThe uploaded LoRA seems to be rank 64 / bf16. bf16 is 2 bytes per element.\n\nFor one LoRA linear projection with shape `3072 -> 3072` and rank 64:\n\n\n    lora_A: 64 x 3072\n    lora_B: 3072 x 64\n\n    elements = 64*3072 + 3072*64\n             = 393,216\n\n    bytes = 393,216 * 2\n          = 786,432 bytes\n          = 0.75 MiB\n\n\nThe default converter keeps 8 attention projections per block:\n\n\n    attn.to_q\n    attn.to_k\n    attn.to_v\n    attn.to_out\n    attn.add_q_proj\n    attn.add_k_proj\n    attn.add_v_proj\n    attn.to_add_out\n\n\nSo the size estimate is:\n\n\n    0.75 MiB * 8 projections * 60 blocks = 360 MiB\n\n\nIn decimal MB:\n\n\n    360 MiB = 377.5 MB\n\n\nThat is almost exactly the reported converted size, `378 MB`.\n\nSo I think the converted adapter size is not mysterious: it is basically the theoretical size of:\n\n\n    60 blocks * 8 attention LoRA projections * rank 64 * bf16\n\n\n## Why the original 850 MB also matches attention + MLP\n\nThe original file is listed as `850 MB` here:\n\nQwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors\n\nThe missing difference is:\n\n\n    850 MB - 378 MB ~= 472 MB\n\n\nThat also matches the expected MLP LoRA size.\n\nQwen-Image blocks contain both image-stream and text-stream MLPs:\n\n\n    img_mlp\n    txt_mlp\n\n\nThe converter explicitly recognizes these MLP keys:\n\n\n    img_mlp.net.0.proj\n    img_mlp.net.2\n    txt_mlp.net.0.proj\n    txt_mlp.net.2\n\n\nAssuming a usual MLP expansion of 4x, the MLP hidden size is approximately:\n\n\n    inner_dim * 4 = 3072 * 4 = 12288\n\n\nFor one MLP LoRA linear `3072 -> 12288` or `12288 -> 3072`, rank 64:\n\n\n    elements = 64*3072 + 12288*64\n             = 983,040\n\n    bytes = 983,040 * 2\n          = 1,966,080 bytes\n          = 1.875 MiB\n\n\nThere are 4 such MLP linears per block:\n\n\n    img_mlp.net.0.proj\n    img_mlp.net.2\n    txt_mlp.net.0.proj\n    txt_mlp.net.2\n\n\nSo:\n\n\n    1.875 MiB * 4 * 60 = 450 MiB\n\n\nIn decimal MB:\n\n\n    450 MiB = 471.9 MB\n\n\nThat is basically the whole missing part.\n\nSo the size arithmetic is:\n\n\n    attention LoRA only ~= 377.5 MB\n    MLP LoRA          ~= 471.9 MB\n    --------------------------------\n    total             ~= 849.4 MB\n\n\nThis is almost exactly the original `850 MB`.\n\nTherefore my rough conclusion is:\n\n\n    original 850 MB ~= attention LoRA + MLP LoRA\n    converted 378 MB ~= attention LoRA only\n\n\n## So is there information loss?\n\nProbably yes, if the goal is to preserve the original LoRA exactly.\n\nBut it is a specific kind of information loss:\n\n  * attention/projection LoRA is preserved\n  * MLP LoRA is probably dropped\n  * `.alpha` keys are skipped, but those are tiny and not the source of the size drop\n  * TextEncoder is not needed to explain the size drop\n\n\n\nI would not assume that this means the converted LoRA is useless. Attention-only LoRA can still have a strong effect, especially on rough prompt binding / layout / style direction. But for a Lightning/distillation LoRA, dropping the MLP part may reduce the low-step quality, details, texture, text rendering, and stability.\n\nMy guess:\n\n\n    simple prompts:      maybe fairly close\n    normal prompts:      likely usable, but weaker than full LoRA\n    complex text/layout: likely more visible degradation\n    4-step / 8-step edge cases: degradation likely more visible\n\n\n## Why TextEncoder is probably not the main explanation\n\nTextEncoder skipping is possible in other LoRA conversion contexts, but here it is not necessary to explain the numbers.\n\nThe converter targets keys like:\n\n\n    transformer_blocks.N.<module>.lora_down.weight\n    transformer_blocks.N.<module>.lora_up.weight\n\n\nIt is not really written as a generic `text_encoder` / `lora_te` converter.\n\nAlso, the sizes line up too cleanly with:\n\n\n    attention-only = 378 MB\n    attention + MLP = 850 MB\n\n\nSo I would explain the size drop as MLP exclusion first, not TextEncoder exclusion.\n\n## Can we keep MLP?\n\nMaybe. The script already has an option:\n\n\n    python comfyui-to-vllm-omni-qwenimage.py \\\n      --input Qwen-Image-2512-Lightning-8steps-V1.0-bf16.safetensors \\\n      --output ./out_adapter_with_mlp \\\n      --dtype bf16 \\\n      --base-model Qwen/Qwen-Image-2512 \\\n      --include-mlp\n\n\nIf this works as intended, I would expect `adapter_model.safetensors` to become close to `850 MB`.\n\nHowever, the converter itself warns that MLP can be tricky:\n\n\n    ap.add_argument(\n        \"--include-mlp\",\n        action=\"store_true\",\n        help=\"Also convert img_mlp/txt_mlp LoRA keys (may fail if vLLM expects different suffixes)\",\n    )\n\n\nThe likely issue is not writing the tensors. Writing the tensors is easy. The issue is whether vLLM-Omni accepts and correctly applies the MLP module suffixes.\n\nFor example, the MLP targets include:\n\n\n    img_mlp.net.0.proj\n    img_mlp.net.2\n    txt_mlp.net.0.proj\n    txt_mlp.net.2\n\n\nTheir suffixes are roughly:\n\n\n    proj\n    2\n\n\n`proj` is probably okay. The numeric suffix `2` may be the fragile part, because vLLM/vLLM-Omni LoRA validation can be strict about module suffixes. There is already a related vLLM issue for numeric-index module names such as `to_out.0`:\n\nvLLM issue #35734: LoRA loading fails for modules with numeric indices\n\nThe current converter already works around the attention-side version of this by normalizing:\n\n\n    attn.to_out.0     -> attn.to_out\n    attn.to_add_out.0 -> attn.to_add_out\n\n\nBut `net.2` is a different case. It may require the vLLM-Omni build to include `\"2\"` in expected LoRA modules, or it may need a more model-specific mapping.\n\n## Suggested sanity check\n\nIf anyone tries `--include-mlp`, I would check three things:\n\n### 1. Size\n\n\n    ls -lh ./out_adapter_with_mlp/adapter_model.safetensors\n\n\nExpected:\n\n\n    ~850 MB\n\n\nIf it is still around `378 MB`, MLP tensors were not included.\n\n### 2. Key counts\n\n\n    from safetensors.torch import load_file\n\n    sd = load_file(\"./out_adapter_with_mlp/adapter_model.safetensors\")\n\n    for needle in [\n        \"img_mlp.net.0.proj\",\n        \"img_mlp.net.2\",\n        \"txt_mlp.net.0.proj\",\n        \"txt_mlp.net.2\",\n    ]:\n        print(needle, sum(1 for k in sd if needle in k))\n\n\nExpected rough count:\n\n\n    each MLP target: 60 blocks * 2 tensors = 120 keys\n\n\n### 3. vLLM-Omni load log\n\nThe important question is whether vLLM-Omni reports that MLP modules were loaded and not silently ignored.\n\nThe vLLM-Omni LoRA docs require a PEFT-style adapter folder:\n\n\n    lora_adapter/\n    ├── adapter_config.json\n    └── adapter_model.safetensors\n\n\nDocs:\n\nvLLM-Omni LoRA guide\n\nIf loading fails on `net.2` / `\"2\"` / target module validation, then I think the clean solution would be either:\n\n  1. patch the converter / `adapter_config.json` target modules, or\n  2. patch vLLM-Omni’s diffusion LoRA mapper / supported modules for Qwen-Image MLP, or\n  3. avoid runtime adapter loading and fuse the LoRA into the base model.\n\n\n\n## Practical recommendation\n\nFor runtime PEFT LoRA:\n\n  1. Try the existing converter with `--include-mlp`.\n  2. Confirm the output is around `850 MB`.\n  3. Confirm `img_mlp` / `txt_mlp` keys exist.\n  4. Try loading in vLLM-Omni.\n  5. If it fails, the likely blocker is target module suffix validation around `net.2`.\n\n\n\nFor maximum quality / minimum loader trouble:\n\n  * fuse/merge the original LoRA into the Qwen-Image-2512 base weights using Diffusers or the reference loader\n  * serve the fused model as a normal model in vLLM-Omni\n\n\n\nThat avoids the whole PEFT key validation problem, although it is no longer a runtime LoRA adapter.\n\n## TL;DR\n\nI think the 378 MB file is probably an attention-only converted adapter.\n\nThe original 850 MB size is almost exactly:\n\n\n    attention LoRA ~= 378 MB\n    MLP LoRA       ~= 472 MB\n    total          ~= 850 MB\n\n\nSo the size drop is probably explained by the converter’s default behavior:\n\n\n    attention-only by default\n    MLP only if --include-mlp is passed\n\n\n`--include-mlp` may preserve the missing tensors, but whether vLLM-Omni can load/apply `img_mlp.net.2` and `txt_mlp.net.2` correctly is the part that needs testing.",
  "title": "How to convert a single safetensors file to PEFT format"
}