External Publication

1st movie clip!

Hugging Face Forums [Unofficial] May 5, 2026

I think the challenge is just too hard… It’s on hard mode from the very start, after all.

This is how I would think about your setup now

First: switching from the direct Desktop install to ComfyUI Portable and suddenly having nodes/Manager behave properly is a real clue, not a coincidence. It strongly suggests the earlier problems were environmental rather than “you not understanding ComfyUI.” That is common with custom-node ecosystems: the install is only truly healthy when the node location and Python environment line up properly.

The good news is that you are now past the hardest beginner wall.

Your current Wan 2.2 setup already does something valuable:

it generates clips reliably
you understand the main nodes
you can use positive/negative conditioning
you can apply one or more LoRAs
you can do first-frame workflows
you can do first-frame → last-frame workflows

That means the main questions are no longer:

“How do I make anything at all?”
“Why won’t the nodes load?”

Your real questions now are more advanced and more interesting:

Why does a classic checkpoint node not seem to fit the Wan graph?
Why does lower FPS make drift look worse, and what should you do about it?
Why do ordinary inpainting tutorials not solve “take this bad frame and fix the face using another face image”?

Those three are connected.

The short answer

If I had to compress the whole answer into one paragraph, it would be this:

Keep your Wan 2.2 workflow as your main shot generator. Do not force a classic SD-style checkpoint loader into the native Wan graph. Treat FPS as a quality/time tradeoff, not as a magic identity fix. Use FLF for the sit-down transition. And for face repair, stop thinking “text-only inpaint” and start thinking “separate still-frame repair workflow using either plain masked face inpaint, ReActor face swap, or mask-local face repair/detailing with a reference-guided method.”

That is the cleanest mental model.

1) About the checkpoint node

Short version

In a native Wan 2.2 workflow, you normally do not insert a classic SD/SDXL-style checkpoint node.

Why

The official Wan 2.2 ComfyUI workflow is not structured like a classic Stable Diffusion workflow where one checkpoint node loads most of the system in one go.

Instead, the official Wan-native flow is built from separate components, typically:

diffusion model loader
CLIP loader
VAE loader
the Wan video node itself
LoRA loader(s)
conditioning nodes

See:

Wan2.2 Video Generation ComfyUI Official Native Workflow Example

What that means for your graph

If your current graph already looks something like:

Load Diffusion Model
Load CLIP
Load VAE
one or more LoRA nodes
positive / negative conditioning
Wan image-to-video or first/last-frame node
decode / save

then you are already using the correct native loading pattern.

So the reason you “can’t figure out how to include a checkpoint node” is probably not that you are missing something. It is more likely that there is no natural slot for a classic checkpoint node in the native Wan graph.

Where a checkpoint loader does make sense

A classic checkpoint loader can make sense in a separate still-image repair workflow.

For example, if you later build a dedicated face-repair graph using:

a still-image inpaint model,
a checkpoint-based image model,
or an SDXL/Flux-style repair branch,

then that separate graph may use a checkpoint node.

But that would be its own repair workflow, not something you must squeeze into the Wan graph itself.

About your LoRA chain

Your current LoRA logic sounds fine.

Relevant docs:

LoRA Loader
LoraLoaderModelOnly

Important points from those docs:

LoRAs are discovered from ComfyUI/models/loras
multiple LoRA nodes can be chained directly
LoraLoaderModelOnly is specifically for applying LoRAs to the model branch only , without needing a CLIP model input on that node

That is why LoRA chaining feels natural in your current setup, while a classic checkpoint node does not.

My practical recommendation

For your Wan graph:

do not force a classic checkpoint loader into it
keep the native Wan structure
only use checkpoint-based loading in a separate repair graph if you later choose a checkpoint-based still-image repair method

2) About FPS, drift, and render time

You noticed:

lower FPS = more visible drift
higher FPS = drift feels less noticeable
but higher FPS = much longer generation time

That observation is useful, and it makes sense.

Why higher FPS often looks better

Higher FPS does not necessarily mean the model suddenly understands identity better.

What it often means is:

each frame is closer to the next in time
motion is split into smaller steps
the changes between frames feel less abrupt
the drift becomes less obvious because the motion is smoother

So the model may still be drifting, but the drift is hidden better by finer temporal spacing.

Why this becomes expensive quickly

The cost scales with frame count.

The official ComfyUI docs for Wan/Fun Inp make this very explicit: video length is the total number of frames , and the example calculation is basically:

seconds × fps = frame count

So if you double FPS while keeping the duration the same, you roughly double the number of frames the system has to generate.

See:

WanFunInpaintToVideo node docs
Wan2.2 Video Generation ComfyUI Official Native Workflow Example

The important production lesson

On 8 GB VRAM, I would not make native 24 FPS your default unless you truly need it.

That is because your real bottleneck is not “video exists or not.” It is:

quality per minute of render time
how many iterations you can afford
whether you can keep enough control over continuity

A better 8 GB strategy

Instead of brute-forcing everything at native 24 FPS, I would bias toward:

shorter clips
moderate native FPS
frame interpolation later , when needed

The official ComfyUI frame interpolation workflow exists for exactly this reason.

See:

ComfyUI frame interpolation workflow

That page is very relevant because it explicitly says frame interpolation:

generates intermediate frames
smooths motion
improves temporal consistency
is useful for increasing frame rate in short clips
is useful for fixing low-FPS generations without regenerating the source frames

My practical recommendation

For your current setup I would test this order:

keep clips short
use a sensible native frame count
use stronger control (first frame, first→last frame)
only then use interpolation for smoother output

That is usually a better quality/time tradeoff than forcing 24 FPS generation everywhere.

3) Why the inpainting tutorials feel like they stop one step too early

This is the part causing the most confusion, and for good reason.

What those tutorials are really teaching

The standard inpainting tutorials teach:

load an image
draw a mask
use text conditioning
regenerate only the masked region

That is generic inpainting.

And yes, that is why:

teapot example works
cloud/hair example works
but your actual problem still feels unsolved

Because your actual problem is not :

replace this masked region with any plausible thing described by text

Your actual problem is:

keep this bad frame as the base image, keep the pose/lighting/composition, and make the masked face look like the correct person from another image

That is a different task.

The missing concept

You are not supposed to put the second face image “onto the canvas” like another background layer.

Instead:

the broken frame remains the base image
the mask defines the region to repair
the second face image enters the graph as a reference / swap source / identity guide
a repair node uses that second image to influence what happens inside the mask

That is the key mental shift.

4) So what are the actual ways to use a second face image?

There are three practical families.

A. Face swap: the direct route

This is the ReActor route.

Use it when:

the frame is already good
the face became the wrong person
the pose, lighting, clothes, and framing are acceptable

Relevant repo:

ComfyUI-ReActor

Why it is relevant:

it is explicitly a face-swap extension for ComfyUI
it supports reusable face models
it is designed for image inputs and is very naturally suited to “fix this bad frame”

In plain language, the workflow is:

input_image = broken frame
source_image or face_model = the correct identity
output = repaired frame

That is probably the closest direct answer to your actual question.

B. Local face repair/detailing: the practical fallback

This is the Impact Pack route.

Relevant repo:

ComfyUI Impact Pack

Important nodes:

MaskPainter — draw the mask
FaceDetailer — detect faces and improve them
MaskDetailer — inpaint only the masked area with a detailer pass

Why it is relevant:

it matches the “keep the frame, only fix the face” logic very well
it is a great fallback if ReActor is awkward or not the right fit
it is especially useful if the face is not just the wrong person but also a bit damaged, blurry, or structurally off

C. Reference-guided identity repair: the most conceptually accurate route

This is the IPAdapter FaceID-style idea.

Relevant repo:

ComfyUI IPAdapter Plus

Why it is relevant:

this is the clearest answer to “how do I use a second image to guide the face repair?”
the second face image becomes an identity reference, not just a prompt substitute
the docs emphasize that regional use is most effective through an inpainting workflow

This route is powerful, but it is more setup-heavy than the other two.

5) My actual recommendation for your case

If this were my setup, I would not try to solve everything inside one giant graph.

I would deliberately split the work into two workflows.

Workflow A — the main Wan video workflow

This is your existing graph.

Keep it for:

image/text/video generation
positive / negative prompt control
LoRAs
first-frame workflows
first-frame → last-frame workflows

This is your shot generator.

Relevant docs:

Wan2.2 Video Generation ComfyUI Official Native Workflow Example
ComfyUI Wan FLF workflow

Workflow B — the separate still-frame repair workflow

This is the graph you use when a shot finishes and the last frame is almost right, but the face is not.

Use it for:

loading the broken frame
masking only the face
repairing that face with one of:
- plain inpaint
- ReActor
- Impact Pack
- reference-guided identity repair

Then save the repaired frame and feed it back into the next Wan shot.

This is your continuity repair tool.

That split is extremely important.

Why I recommend two workflows

Because it gives each graph one clear job:

Workflow A creates shots
Workflow B repairs bridge frames

That is much easier to understand and much easier to debug than an all-in-one “do everything” workflow.

6) Repair vs recreate: the rule that will save you the most time

This is the rule I would use.

Repair when:

the frame is already mostly good
the body pose is right
the lighting is right
the composition is right
the background / bench is right
only the face or a tiny area drifted

Recreate when:

the pose is wrong
the camera is wrong
the sit-down motion is wrong
multiple frames in a row are bad
fixing the face would still leave the shot unusable

For your project, that usually means:

walk : repair the last frame if only the face drifted
approach bench : same
sit-down transition : usually recreate with FLF, not patch frame-by-frame
seated shot : repair isolated face drift, recreate bad staging

This is the production logic I would trust.

7) The exact answer to “what am I doing wrong?”

I do not think you are doing the wrong operation.

I think you are trying to solve a reference-guided identity repair task with a text-only generic inpainting tutorial.

That is the mismatch.

You are not failing because you do not understand masking.

You are failing because your actual task needs one more ingredient:

a second-image identity mechanism

That mechanism is usually:

ReActor as a swap source
or IPAdapter FaceID as a reference-conditioning source
or Impact Pack as a local face repair route when the face is damaged and needs controlled correction

That is the real missing piece.

8) The most practical next steps I would recommend

Step 1

Keep your current Wan graph as your main shot workflow.

Step 2

Stop worrying about forcing a classic checkpoint node into that graph.

Step 3

Create a separate frame repair workflow.

Start with the official inpaint tutorial just to make sure the mask/local-rebuild logic is completely clear:

ComfyUI Inpainting Workflow

Then decide:

slight face drift → plain masked face inpaint
wrong person → ReActor
local face repair / fallback → Impact Pack
strong second-image identity guidance → IPAdapter FaceID-style route

Step 4

Use FLF for your sit-down transition.

That is a cleaner solution than trying to patch a bad sit-down clip.

Step 5

Do not default to native 24 FPS everywhere. Test:

short clips
moderate native FPS
interpolation later

See:

ComfyUI frame interpolation workflow

9) The one-sentence summary

Your Wan workflow should generate shots; your separate repair workflow should fix bad bridge frames using a mask plus either a swap source, a face-detail pass, or a second-image identity guide.

That is the cleanest way to make the whole thing click.

10) The exact links I would keep open

Main generation:

Wan2.2 Video Generation ComfyUI Official Native Workflow Example
ComfyUI Wan FLF workflow

Repair / continuity:

ComfyUI Inpainting Workflow
ComfyUI-ReActor
ComfyUI Impact Pack
ComfyUI IPAdapter Plus

Later / heavier options:

WanFunInpaintToVideo node docs
Wan VACE To Video node docs

Utilities:

ComfyUI frame interpolation workflow

Final recommendation

If this were my setup, I would do this:

keep the current Wan 2.2 graph
do not add a classic checkpoint loader to it
build one separate still-frame repair graph
use that graph to fix bad bridge frames
use FLF for the sit-down transition
use frame interpolation instead of brute-forcing native 24 FPS everywhere
only later consider heavier clip-editing or training workflows

That is the simplest, cleanest, least frustrating path from where you are now.

This is how I would think about your setup now

The short answer

1) About the checkpoint node

Short version

Why

What that means for your graph

Where a checkpoint loader does make sense

About your LoRA chain

My practical recommendation

2) About FPS, drift, and render time

Why higher FPS often looks better

Why this becomes expensive quickly

The important production lesson

A better 8 GB strategy

My practical recommendation

3) Why the inpainting tutorials feel like they stop one step too early

What those tutorials are really teaching

The missing concept

4) So what are the actual ways to use a second face image?

A. Face swap: the direct route

B. Local face repair/detailing: the practical fallback

C. Reference-guided identity repair: the most conceptually accurate route

5) My actual recommendation for your case

Workflow A — the main Wan video workflow

Workflow B — the separate still-frame repair workflow

Why I recommend two workflows

6) Repair vs recreate: the rule that will save you the most time

Repair when:

Recreate when:

7) The exact answer to “what am I doing wrong?”

8) The most practical next steps I would recommend

Step 1

Step 2

Step 3

Step 4

Step 5

9) The one-sentence summary

10) The exact links I would keep open

Final recommendation

Discussion in the ATmosphere