Multi-image edit (3 refs): artifacts at true CFG, fine on Lightning — reference-content dependent
Setup
Qwen-Image-Edit-2511 BF16 via ComfyUI TextEncodeQwenImageEditPlus with 3 reference images (face close-up + body front + body back) Output 1024×1536 (non-square 2:3) Sampler: res_3m + bong_tangent (RES4LYF)
Behavior With full CFG (2.7, 33 steps): generation reliably breaks on some reference sets — mixed artifacts (identity drift, color/texture corruption, anatomy distortion). Same parameters and seeds produce clean output on other reference sets. With Lightning 4-step (true_cfg=1): every reference set is clean. Pattern
1 reference (face only) → always clean, both modes 3 references → clean on some characters, broken on others — content-dependent All references are Z-Image Turbo outputs, same prompt structure, identical dimensions Failing sets tend to contain high-frequency content (curly hair, darker skin texture); working sets tend to be lower-frequency (straight hair, lighter skin). To be clear: this is about the rendered references, not the character identity itself.
What I’ve tried (no fix, or partial only)
I don’t even remember what I tried, but I tried a lot of things that seemed possible, and none of them worked. The workflow is below.
Question Is this a known interaction between multi-ref token packing and the true-CFG noise_pred * (cond_norm / noise_norm) rescale path? Specifically:
Does Qwen2.5-VL’s 384² downscale produce per-token norm outliers on high-frequency reference content that get amplified across denoising steps once true CFG is active? Is multi-image reference (3+ refs) currently only stable at distilled-CFG / Lightning, or is there a recommended setup for full CFG?
Discussion in the ATmosphere