External Publication

Wan2.2 i2v (clarifications needed regarding settings on low vram system)

Hugging Face Forums [Unofficial] May 10, 2026

4-step variation might not be suitable for your purpose:

Wan2.2 I2V on 8GB VRAM: practical baseline for source-faithful animation

For your exact goal — make the picture move, keep the same face, keep the same identity, keep the same lighting/background/clothing, and avoid AI embellishment — I would not tune this like a normal high-VRAM Wan2.2 setup.

Your current results are not strange:

CFG from 1 to ~3 doing almost nothing is plausible in a 4-step Rapid/Lightning-style workflow.
CFG above ~3 turning the output into overcooked chaos is also plausible.
Denoise around 0.6 helping sharpness/color/source fidelity is not ridiculous.
Different source images needing different settings usually means the workflow has too many interacting variables: GGUF quantization, Rapid/distilled weights, sampler, scheduler, shift, text encoder quality, VAE, offloading, source-image difficulty, and the Wan2.2 High/Low-noise expert split.

The core point:

Do not treat CFG as the main “obedience knob” in your setup. For 8GB VRAM + GGUF + 4-step Rapid/Lightning-style I2V, CFG is a small final adjustment, not the steering wheel.

The knobs I would tune first are:

source image quality / crop
denoise
motion size
shift
Low-noise step count / Low-noise quantization
sampler branch
text encoder quantization
CFG last

Useful references:

ComfyUI official Wan2.2 workflow guide
Wan2.2 official GitHub
Wan2.2 I2V A14B model card
ComfyUI-GGUF
QuantStack Wan2.2 I2V A14B GGUF
city96 UMT5 XXL encoder GGUF
WanMoeKSampler
Wan2.2-Lightning
LightX2V Wan2.2 I2V working guide discussion
ComfyUI-CacheDiT
Kijai ComfyUI-WanVideoWrapper

1. Why your current setup is hard to tune

You are not simply running “Wan2.2.” You are running a stacked compromise:

Wan2.2-style I2V
+ Rapid/AIO or distilled behavior
+ GGUF quantization
+ Q4-class compression
+ 4-step sampling
+ SageAttention
+ BlockSwap/offload
+ 8GB laptop VRAM
+ denoise below 1.0
+ SD3 shift
+ image conditioning

That matters because one setting can appear useless when another part of the stack is dominating.

For example, CFG may appear to do nothing because:

the model was distilled/merged for CFG 1
4 steps are too few for CFG to gradually steer the output
image conditioning dominates the text
the negative prompt is weak or mostly inactive at CFG 1
quantization reduces sensitivity to small guidance changes
the sampler/scheduler/shift combination matters more than CFG
the High/Low-noise split is doing more than the text guidance

Some Rapid/AIO model cards explicitly say their models are intended for CFG 1 and 4 steps. See the WAN2.2 Rapid All-in-One model card. Wan2.2-Lightning similarly describes a 4-step distilled path, so it should not be tuned like a normal 20–30 step diffusion workflow. See Wan2.2-Lightning.

So your observation — “CFG 1 to 3 did nothing, then above 3 broke everything” — is consistent with this kind of workflow.

2. The most important Wan2.2 idea: High-noise vs Low-noise experts

Wan2.2 A14B uses a Mixture-of-Experts style denoising structure. The official Wan2.2 repo describes MoE as separating the denoising process across timesteps with specialized expert models. See Wan2.2 official GitHub.

In practical I2V terms:

Part	Mostly affects	If weak/wrong, you may see
High-noise expert	broad motion, layout, pose, composition, camera direction	scene drift, pose weirdness, motion chaos, composition changes
Low-noise expert	face detail, eyes, mouth, skin, clothing texture, color, final sharpness	face melting, blur, color shift, unstable eyes/mouth, loss of likeness

For your goal, Low-noise behavior is extremely important.

If the face changes, the first fix is usually not “raise CFG.” More likely fixes are:

lower denoise
reduce the requested motion
add more Low-noise steps
use a better Low-noise quant if possible
check the VAE
crop/use a clearer source face
avoid cinematic/camera-heavy prompts
avoid LoRAs until the baseline is stable

WanMoeKSampler is relevant if you are using separate High/Low Wan2.2 A14B models. Its README says it is designed for Wan2.2 A14B-style MoE workflows and avoids manually guessing the High-to-Low switch point. See WanMoeKSampler.

3. Best starting point for your actual goal

Your goal is not “maximum cinematic transformation.” Your goal is:

same person
same face
same identity
same clothing
same lighting
same background
small natural movement
static camera
no embellishment

So I would start conservative.

Recommended baseline for your current Rapid/AIO-style setup

Sampler: sa_solver / beta, if that is your current most reliable branch
Steps: 4
CFG: 1.0
Denoise: 0.55–0.60
SD3 shift: 8 as current control, then test 5 and 6
Resolution: 512–640px long side while testing
Frames: 33–49 while testing
FPS: 12–16
Motion: subtle
Camera: static
LoRAs: none during baseline
Upscaling/interpolation: none during baseline
Face restore: none during baseline

This is not meant to be the final “best possible” setup. It is the control setup. You need a repeatable control before changing settings.

4. Do not micro-tweak CFG

On your hardware, micro-tweaking CFG by 0.1 is a bad use of time.

Instead of:

1.0
1.1
1.2
1.3
1.4
...

Use coarse tests:

CFG 1.0
CFG 1.5
CFG 2.0
CFG 2.5
CFG 3.0 only as a limit test

For your setup, I would treat CFG like this:

CFG	Practical meaning
1.0	safest Rapid/Lightning-style baseline
1.5	mild text pressure
2.0	moderate text pressure
2.5	upper useful range to test
3.0	stress-test boundary
> 3.0	likely to overcook identity, color, texture, or motion

If CFG 1.5–2.5 gives no meaningful obedience improvement, stop chasing CFG. The bottleneck is probably elsewhere.

5. Denoise is probably more important than CFG for you

For source-faithful I2V, denoise is one of the strongest identity controls.

Denoise	Expected behavior
0.40–0.50	most faithful, least motion, may look stiff
0.50–0.60	best starting zone for “make the image move”
0.60–0.70	more motion, more identity risk
0.70+	more transformation, more AI invention

Since you already found 0.6 useful, I would not abandon it. I would test:

Denoise 0.50
Denoise 0.55
Denoise 0.60
Denoise 0.65

Pick the best identity/motion balance.

If the face changes:

lower denoise first
reduce motion second
add Low-noise steps third
only then try CFG changes

If there is no movement:

raise denoise slightly
make the action simpler and more literal
avoid cinematic wording

6. Shift: test coarse values only

Do not test tiny shift increments. Test meaningful jumps.

For your current setup:

Shift 5
Shift 6
Shift 8

The LightX2V Wan2.2 I2V working-guide discussion recommends:

Euler sampler
Simple scheduler
Shift 5
2 High steps
2 Low steps

Source: LightX2V Wan2.2 I2V working guide discussion

That does not automatically mean shift 5 is best for your current Rapid/AIO branch, but it is a strong branch to test.

7. Sampler advice

For your current Rapid/AIO branch

If sa_solver / beta / 4 steps / CFG 1 / denoise 0.6 / shift 8 is the only thing giving you usable results, keep it as the control.

Do not throw it away just because it sounds weird.

Rapid/distilled/merged models can have very specific intended recipes. The model card for the Rapid AIO family says the models are intended for CFG 1 and 4 steps , and different versions list different sampler recommendations. See WAN2.2 Rapid All-in-One.

For a Lightning-style branch

Test this separately:

Sampler: Euler
Scheduler: Simple
Steps: 4
CFG: 1.0
Shift: 5
Denoise: 0.55–0.60

That lines up with public LightX2V/Wan2.2-Lightning guidance. See Wan2.2-Lightning and the LightX2V working-guide discussion.

Compare this branch against your current sa_solver / beta control. Do not mix the two while testing.

8. Low-noise steps may help face consistency more than CFG

If your workflow exposes the High/Low split, test this before pushing CFG:

Test	High steps	Low steps	Purpose
A	2	2	fastest 4-step baseline
B	2	4	more face/detail finishing
C	4	4	balanced reference
D	4	6	stronger finishing if time allows
E	6	4	more broad structure/motion

For your goal, I would test:

2 High / 2 Low
2 High / 4 Low
4 High / 4 Low

If 2/2 is blurry but 2/4 improves face/detail , that tells you the Low-noise stage was underpowered.

9. Quantization: Q4_K_M is not automatically best on 8GB

On paper, higher quantization quality is better. In practice, on an 8GB laptop GPU, a heavier quant can cause more offload pressure, swapping, instability, or unusable render times.

The QuantStack Wan2.2 I2V A14B GGUF repo lists approximate model sizes such as:

Q3_K_S: 6.52 GB
Q3_K_M: 7.18 GB
Q4_K_S: 8.75 GB
Q4_K_M: 9.65 GB
Q5_K_S: 10.1 GB
Q5_K_M: 10.8 GB
Q6_K: 12 GB
Q8_0: 15.4 GB

Source: QuantStack Wan2.2 I2V A14B GGUF

For an 8GB 4060 laptop, I would test:

Test	High-noise	Low-noise	Why
A	Q3_K_M	Q3_K_M	safest low-VRAM baseline
B	Q4_K_S	Q4_K_S	better quality if stable
C	Q3_K_M	Q4_K_S	prioritize face/detail
D	Q4_K_S	Q3_K_M	prioritize structure/motion
E	Q4_K_M	Q4_K_M	only if the above are stable

For your priority, I would try:

High-noise: Q3_K_M
Low-noise: Q4_K_S

before assuming:

High-noise: Q4_K_M
Low-noise: Q4_K_M

Why: Low-noise has more influence on final face detail, skin, eyes, mouth, color, and sharpness. If you can only “spend” quality somewhere, spend it on Low-noise first.

10. Text encoder quantization matters for prompt obedience

If prompt obedience feels weak, do not only blame CFG. The text encoder can matter too.

The city96 UMT5 XXL encoder GGUF card recommends Q5_K_M or larger for best results , while noting that smaller models may still be acceptable in resource-constrained situations. It lists Q3_K_M around 3.06GB, Q4_K_M around 3.66GB, and Q5_K_M around 4.15GB. See city96 UMT5 XXL encoder GGUF.

For your system:

UMT5 Q3_K_M: safest
UMT5 Q4_K_M: reasonable baseline
UMT5 Q5_K_M: better prompt understanding if RAM/offload behavior is tolerable

If CFG does not improve obedience, a better text encoder may help more than CFG micro-tweaks.

11. VAE check: important for color and softness

If Wan2.2 looks redder, softer, or less vivid than expected, check the VAE.

The official ComfyUI Wan2.2 guide distinguishes the model components for different workflows. The 14B I2V workflow uses separate High/Low I2V models and a Wan VAE component; the 5B TI2V workflow uses its own 5B model/VAE setup. See ComfyUI official Wan2.2 guide.

A VAE mismatch can show up as:

red/yellow color cast
soft decode
loss of vividness
skin tone shift
general haze
reconstruction blur

If color is your issue, test VAE/workflow correctness before trying to fix it with prompt words like “neutral color” or “no red tint.”

12. Source image quality matters more than people admit

For face consistency, the source image should have:

clear face
visible eyes
visible mouth
not too small in frame
not heavily compressed
not extreme side profile
not harsh shadow over one eye
not heavy motion blur
not strong fisheye distortion
not sunglasses covering identity
not hands blocking the face

A simple rule:

If the source face is small or unclear, the model has to invent face detail during motion. When it invents face detail, identity changes.

For baseline testing, use a clean portrait or half-body image. You can do fancy shots later.

13. Prompt style for source-faithful animation

Use a boring prompt. Do not make it cinematic. Do not add style words. Do not describe a new scene.

Positive prompt baseline

A realistic image-to-video animation of the person in the source image. Preserve the exact same face, identity, hairstyle, clothing, colors, lighting, and background. The person makes only very subtle natural movement: slight breathing, a small blink, and minimal head movement. Static camera. No zoom. No scene change. Natural colors. Sharp facial details.

Negative prompt baseline

different person, face change, identity change, distorted face, warped eyes, asymmetrical eyes, deformed mouth, changing hairstyle, changing clothes, changing background, camera movement, zoom, scene change, fantasy, sci-fi, anime, painting, overexposed, oversaturated, red tint, blurry, low detail, melted face, extra teeth

Important: at CFG 1 , the negative prompt may do very little. Judge negative prompting mostly at CFG 1.5–2.5.

14. Prompt obedience testing

Do not test obedience with complex motion first.

Bad obedience tests:

turns around
walks forward
raises both hands
laughs widely
talks
dances
camera orbits around the subject
wind blows hair dramatically

Good obedience tests:

one subtle blink
gentle breathing only
slight smile
very small head tilt
tiny eye movement

A model that cannot obey “one subtle blink” is not ready for “turns head, smiles, and raises hand.”

Better prompt wording

Instead of:

The woman turns her head and smiles at the camera while wind blows through her hair.

Use:

The person makes a very small natural smile while keeping the same face, same pose, same hairstyle, same clothing, same lighting, and same background. Static camera.

The second prompt gives the model less room to invent.

15. What to do when the model does not obey

First classify the failure.

Failure	Likely cause	First fix
prompt action ignored	too few steps, weak text encoder, action too subtle, distilled limitation	slightly raise denoise or simplify action
face changes	denoise too high, Low-noise weak, source face unclear, motion too large	lower denoise / add Low steps
red tint	VAE/model/sampler/shift issue	check VAE, test shift/sampler
blurry face	Low-noise too weak, too few steps, low quant, low resolution	add Low steps / better Low quant
background changes	denoise too high, prompt invites scene change	lower denoise / static camera prompt
too much motion	denoise/CFG/shift too high, Rapid merge exaggeration	lower denoise or reduce action
no motion	denoise too low, prompt too static	denoise +0.05

The order I would use:

1. Keep CFG at 1.0.
2. Make the action simpler and more literal.
3. Tune denoise: 0.50 / 0.55 / 0.60 / 0.65.
4. Test shift: 5 / 6 / 8.
5. Add Low-noise steps if available.
6. Improve Low-noise quantization if possible.
7. Test CFG 1.5 / 2.0 / 2.5.
8. Stop before CFG 3 if identity starts changing.

16. Recommended experiment matrix

Do not run huge matrices at full resolution. Use short clips first.

Keep these fixed:

same image
same seed
same prompt
same resolution
same frame count
same workflow branch

Matrix A — denoise

CFG: 1.0
Steps: 4
Shift: current value
Sampler: current best

Test:

0.50
0.55
0.60
0.65

Pick the best identity/motion balance.

Matrix B — shift

Use the best denoise from Matrix A.

Shift 5
Shift 6
Shift 8

Pick the best.

Matrix C — CFG

Use best denoise + best shift.

CFG 1.0
CFG 1.5
CFG 2.0
CFG 2.5
CFG 3.0 only as a limit test

Pick the highest CFG that does not alter identity.

Matrix D — High/Low steps

If available:

2 High / 2 Low
2 High / 4 Low
4 High / 4 Low

If face detail improves with more Low steps, you found a better lever than CFG.

Matrix E — quantization

If using separate GGUF High/Low models:

Q3_K_M High / Q3_K_M Low
Q3_K_M High / Q4_K_S Low
Q4_K_S High / Q4_K_S Low

Avoid assuming Q4_K_M is worth the offload cost on 8GB.

17. Additional nodes: what I would and would not add

Worth testing later: WanMoeKSampler

Use it if you are working with separate Wan2.2 A14B High/Low models.

Good for:

clean A14B High/Low workflows
reducing manual High/Low split guessing
debugging MoE transition behavior

Not a fix for:

bad source image
bad VAE
too much denoise
bad prompt
4-step model limitations

Source: WanMoeKSampler

Required for GGUF: ComfyUI-GGUF

Use the proper GGUF loader rather than treating GGUF like a normal checkpoint. The ComfyUI-GGUF README says to replace the stock “Load Diffusion Model” with the “Unet Loader (GGUF)” node. See ComfyUI-GGUF.

Probably skip at 4 steps: CacheDiT

CacheDiT is more useful when you have enough steps to amortize the cache/warmup overhead. For Wan2.2 14B, its README says to use the dedicated Wan Cache Optimizer for best results with the MoE High/Low structure. See ComfyUI-CacheDiT.

My practical rule:

4 steps: skip CacheDiT
6–8 steps: probably skip unless testing
12–20 steps: consider CacheDiT

Useful but separate branch: Kijai WanVideoWrapper

Kijai’s wrapper is useful and often gets Wan-specific optimizations quickly. The official Wan2.2 repo lists it as an alternative implementation. See Wan2.2 official GitHub and Kijai ComfyUI-WanVideoWrapper.

But treat it as a separate branch. Do not change wrapper + sampler + LoRAs + resolution all at once.

18. Things I would avoid during baseline testing

Avoid:

720p
81+ frames
large camera movement
large head turns
talking/lip motion
multiple LoRAs
face restore nodes
interpolation while judging motion
upscaling while judging source fidelity
CFG above 3
high denoise
changing sampler + CFG + denoise together
testing tiny CFG increments

Especially avoid this kind of starting point:

8GB VRAM
A14B Q4_K_M
720p
81 frames
4-step Rapid/AIO
SageAttention
BlockSwap
multiple LoRAs
CFG above 3

That can produce occasional good clips, but it is a terrible learning baseline because too many variables are interacting.

19. Practical final recommendation

For your current setup, I would start here:

Sampler: sa_solver / beta if this is your current reliable branch
Steps: 4
CFG: 1.0
Denoise: 0.55–0.60
Shift: 8 as current control; test 5 and 6 separately
Resolution: 512–640px long side
Frames: 33–49
Prompt: subtle realistic motion only
Camera: static
LoRAs: none
Upscale/interpolation/face restore: off during testing

Then tune in this order:

1. Source image/crop
2. Denoise
3. Shift
4. Low-noise steps
5. Low-noise quantization
6. Text encoder quantization
7. CFG 1.5–2.5
8. Sampler branch
9. Extra nodes

My strongest advice:

Stop trying to solve source fidelity with CFG. For your setup, source fidelity is mostly denoise + Low-noise behavior + source image quality + correct workflow components. CFG is only a small final pressure adjustment.

20. Short summary

Your current results are plausible for 8GB VRAM + GGUF + 4-step Rapid/Lightning-style I2V.
CFG is not the main control. Do not micro-tune it.
Use CFG 1.0 as baseline; test 1.5 / 2.0 / 2.5 only after denoise and shift.
Use denoise 0.50–0.60 for source-faithful animation.
If faces drift, lower denoise or improve Low-noise behavior before raising CFG.
If available, test 2 High / 4 Low and 4 High / 4 Low.
For 8GB VRAM, test Q3_K_M and Q4_K_S before assuming Q4_K_M is best.
If you can only improve one expert, improve Low-noise first for face/detail.
Use short 33–49 frame clips at 512–640px while testing.
Avoid 720p, long clips, multiple LoRAs, and post-processing until the baseline is stable.