External Publication
Visit Post

1st movie clip!

Hugging Face Forums [Unofficial] May 5, 2026
Source

I think the challenge is just too hard… It’s on hard mode from the very start, after all.


This is how I would think about your setup now

First: switching from the direct Desktop install to ComfyUI Portable and suddenly having nodes/Manager behave properly is a real clue, not a coincidence. It strongly suggests the earlier problems were environmental rather than “you not understanding ComfyUI.” That is common with custom-node ecosystems: the install is only truly healthy when the node location and Python environment line up properly.

The good news is that you are now past the hardest beginner wall.

Your current Wan 2.2 setup already does something valuable:

  • it generates clips reliably
  • you understand the main nodes
  • you can use positive/negative conditioning
  • you can apply one or more LoRAs
  • you can do first-frame workflows
  • you can do first-frame → last-frame workflows

That means the main questions are no longer:

  • “How do I make anything at all?”
  • “Why won’t the nodes load?”

Your real questions now are more advanced and more interesting:

  1. Why does a classic checkpoint node not seem to fit the Wan graph?
  2. Why does lower FPS make drift look worse, and what should you do about it?
  3. Why do ordinary inpainting tutorials not solve “take this bad frame and fix the face using another face image”?

Those three are connected.


The short answer

If I had to compress the whole answer into one paragraph, it would be this:

Keep your Wan 2.2 workflow as your main shot generator. Do not force a classic SD-style checkpoint loader into the native Wan graph. Treat FPS as a quality/time tradeoff, not as a magic identity fix. Use FLF for the sit-down transition. And for face repair, stop thinking “text-only inpaint” and start thinking “separate still-frame repair workflow using either plain masked face inpaint, ReActor face swap, or mask-local face repair/detailing with a reference-guided method.”

That is the cleanest mental model.


1) About the checkpoint node

Short version

In a native Wan 2.2 workflow, you normally do not insert a classic SD/SDXL-style checkpoint node.

Why

The official Wan 2.2 ComfyUI workflow is not structured like a classic Stable Diffusion workflow where one checkpoint node loads most of the system in one go.

Instead, the official Wan-native flow is built from separate components, typically:

  • diffusion model loader
  • CLIP loader
  • VAE loader
  • the Wan video node itself
  • LoRA loader(s)
  • conditioning nodes

See:

  • Wan2.2 Video Generation ComfyUI Official Native Workflow Example

What that means for your graph

If your current graph already looks something like:

  • Load Diffusion Model
  • Load CLIP
  • Load VAE
  • one or more LoRA nodes
  • positive / negative conditioning
  • Wan image-to-video or first/last-frame node
  • decode / save

then you are already using the correct native loading pattern.

So the reason you “can’t figure out how to include a checkpoint node” is probably not that you are missing something. It is more likely that there is no natural slot for a classic checkpoint node in the native Wan graph.

Where a checkpoint loader does make sense

A classic checkpoint loader can make sense in a separate still-image repair workflow.

For example, if you later build a dedicated face-repair graph using:

  • a still-image inpaint model,
  • a checkpoint-based image model,
  • or an SDXL/Flux-style repair branch,

then that separate graph may use a checkpoint node.

But that would be its own repair workflow, not something you must squeeze into the Wan graph itself.

About your LoRA chain

Your current LoRA logic sounds fine.

Relevant docs:

  • LoRA Loader
  • LoraLoaderModelOnly

Important points from those docs:

  • LoRAs are discovered from ComfyUI/models/loras
  • multiple LoRA nodes can be chained directly
  • LoraLoaderModelOnly is specifically for applying LoRAs to the model branch only , without needing a CLIP model input on that node

That is why LoRA chaining feels natural in your current setup, while a classic checkpoint node does not.

My practical recommendation

For your Wan graph:

  • do not force a classic checkpoint loader into it
  • keep the native Wan structure
  • only use checkpoint-based loading in a separate repair graph if you later choose a checkpoint-based still-image repair method

2) About FPS, drift, and render time

You noticed:

  • lower FPS = more visible drift
  • higher FPS = drift feels less noticeable
  • but higher FPS = much longer generation time

That observation is useful, and it makes sense.

Why higher FPS often looks better

Higher FPS does not necessarily mean the model suddenly understands identity better.

What it often means is:

  • each frame is closer to the next in time
  • motion is split into smaller steps
  • the changes between frames feel less abrupt
  • the drift becomes less obvious because the motion is smoother

So the model may still be drifting, but the drift is hidden better by finer temporal spacing.

Why this becomes expensive quickly

The cost scales with frame count.

The official ComfyUI docs for Wan/Fun Inp make this very explicit: video length is the total number of frames , and the example calculation is basically:

  • seconds × fps = frame count

So if you double FPS while keeping the duration the same, you roughly double the number of frames the system has to generate.

See:

  • WanFunInpaintToVideo node docs
  • Wan2.2 Video Generation ComfyUI Official Native Workflow Example

The important production lesson

On 8 GB VRAM, I would not make native 24 FPS your default unless you truly need it.

That is because your real bottleneck is not “video exists or not.” It is:

  • quality per minute of render time
  • how many iterations you can afford
  • whether you can keep enough control over continuity

A better 8 GB strategy

Instead of brute-forcing everything at native 24 FPS, I would bias toward:

  1. shorter clips
  2. moderate native FPS
  3. frame interpolation later , when needed

The official ComfyUI frame interpolation workflow exists for exactly this reason.

See:

  • ComfyUI frame interpolation workflow

That page is very relevant because it explicitly says frame interpolation:

  • generates intermediate frames
  • smooths motion
  • improves temporal consistency
  • is useful for increasing frame rate in short clips
  • is useful for fixing low-FPS generations without regenerating the source frames

My practical recommendation

For your current setup I would test this order:

  • keep clips short
  • use a sensible native frame count
  • use stronger control (first frame, first→last frame)
  • only then use interpolation for smoother output

That is usually a better quality/time tradeoff than forcing 24 FPS generation everywhere.


3) Why the inpainting tutorials feel like they stop one step too early

This is the part causing the most confusion, and for good reason.

What those tutorials are really teaching

The standard inpainting tutorials teach:

  • load an image
  • draw a mask
  • use text conditioning
  • regenerate only the masked region

That is generic inpainting.

And yes, that is why:

  • teapot example works
  • cloud/hair example works
  • but your actual problem still feels unsolved

Because your actual problem is not :

replace this masked region with any plausible thing described by text

Your actual problem is:

keep this bad frame as the base image, keep the pose/lighting/composition, and make the masked face look like the correct person from another image

That is a different task.

The missing concept

You are not supposed to put the second face image “onto the canvas” like another background layer.

Instead:

  • the broken frame remains the base image
  • the mask defines the region to repair
  • the second face image enters the graph as a reference / swap source / identity guide
  • a repair node uses that second image to influence what happens inside the mask

That is the key mental shift.


4) So what are the actual ways to use a second face image?

There are three practical families.

A. Face swap: the direct route

This is the ReActor route.

Use it when:

  • the frame is already good
  • the face became the wrong person
  • the pose, lighting, clothes, and framing are acceptable

Relevant repo:

  • ComfyUI-ReActor

Why it is relevant:

  • it is explicitly a face-swap extension for ComfyUI
  • it supports reusable face models
  • it is designed for image inputs and is very naturally suited to “fix this bad frame”

In plain language, the workflow is:

  • input_image = broken frame
  • source_image or face_model = the correct identity
  • output = repaired frame

That is probably the closest direct answer to your actual question.

B. Local face repair/detailing: the practical fallback

This is the Impact Pack route.

Relevant repo:

  • ComfyUI Impact Pack

Important nodes:

  • MaskPainter — draw the mask
  • FaceDetailer — detect faces and improve them
  • MaskDetailer — inpaint only the masked area with a detailer pass

Why it is relevant:

  • it matches the “keep the frame, only fix the face” logic very well
  • it is a great fallback if ReActor is awkward or not the right fit
  • it is especially useful if the face is not just the wrong person but also a bit damaged, blurry, or structurally off

C. Reference-guided identity repair: the most conceptually accurate route

This is the IPAdapter FaceID-style idea.

Relevant repo:

  • ComfyUI IPAdapter Plus

Why it is relevant:

  • this is the clearest answer to “how do I use a second image to guide the face repair?”
  • the second face image becomes an identity reference, not just a prompt substitute
  • the docs emphasize that regional use is most effective through an inpainting workflow

This route is powerful, but it is more setup-heavy than the other two.


5) My actual recommendation for your case

If this were my setup, I would not try to solve everything inside one giant graph.

I would deliberately split the work into two workflows.


Workflow A — the main Wan video workflow

This is your existing graph.

Keep it for:

  • image/text/video generation
  • positive / negative prompt control
  • LoRAs
  • first-frame workflows
  • first-frame → last-frame workflows

This is your shot generator.

Relevant docs:

  • Wan2.2 Video Generation ComfyUI Official Native Workflow Example
  • ComfyUI Wan FLF workflow

Workflow B — the separate still-frame repair workflow

This is the graph you use when a shot finishes and the last frame is almost right, but the face is not.

Use it for:

  • loading the broken frame
  • masking only the face
  • repairing that face with one of:
    • plain inpaint
    • ReActor
    • Impact Pack
    • reference-guided identity repair

Then save the repaired frame and feed it back into the next Wan shot.

This is your continuity repair tool.

That split is extremely important.

Why I recommend two workflows

Because it gives each graph one clear job:

  • Workflow A creates shots
  • Workflow B repairs bridge frames

That is much easier to understand and much easier to debug than an all-in-one “do everything” workflow.


6) Repair vs recreate: the rule that will save you the most time

This is the rule I would use.

Repair when:

  • the frame is already mostly good
  • the body pose is right
  • the lighting is right
  • the composition is right
  • the background / bench is right
  • only the face or a tiny area drifted

Recreate when:

  • the pose is wrong
  • the camera is wrong
  • the sit-down motion is wrong
  • multiple frames in a row are bad
  • fixing the face would still leave the shot unusable

For your project, that usually means:

  • walk : repair the last frame if only the face drifted
  • approach bench : same
  • sit-down transition : usually recreate with FLF, not patch frame-by-frame
  • seated shot : repair isolated face drift, recreate bad staging

This is the production logic I would trust.


7) The exact answer to “what am I doing wrong?”

I do not think you are doing the wrong operation.

I think you are trying to solve a reference-guided identity repair task with a text-only generic inpainting tutorial.

That is the mismatch.

You are not failing because you do not understand masking.

You are failing because your actual task needs one more ingredient:

  • a second-image identity mechanism

That mechanism is usually:

  • ReActor as a swap source
  • or IPAdapter FaceID as a reference-conditioning source
  • or Impact Pack as a local face repair route when the face is damaged and needs controlled correction

That is the real missing piece.


8) The most practical next steps I would recommend

Step 1

Keep your current Wan graph as your main shot workflow.

Step 2

Stop worrying about forcing a classic checkpoint node into that graph.

Step 3

Create a separate frame repair workflow.

Start with the official inpaint tutorial just to make sure the mask/local-rebuild logic is completely clear:

  • ComfyUI Inpainting Workflow

Then decide:

  • slight face drift → plain masked face inpaint
  • wrong person → ReActor
  • local face repair / fallback → Impact Pack
  • strong second-image identity guidance → IPAdapter FaceID-style route

Step 4

Use FLF for your sit-down transition.

That is a cleaner solution than trying to patch a bad sit-down clip.

Step 5

Do not default to native 24 FPS everywhere. Test:

  • short clips
  • moderate native FPS
  • interpolation later

See:

  • ComfyUI frame interpolation workflow

9) The one-sentence summary

Your Wan workflow should generate shots; your separate repair workflow should fix bad bridge frames using a mask plus either a swap source, a face-detail pass, or a second-image identity guide.

That is the cleanest way to make the whole thing click.


10) The exact links I would keep open

Main generation:

  • Wan2.2 Video Generation ComfyUI Official Native Workflow Example
  • ComfyUI Wan FLF workflow

Repair / continuity:

  • ComfyUI Inpainting Workflow
  • ComfyUI-ReActor
  • ComfyUI Impact Pack
  • ComfyUI IPAdapter Plus

Later / heavier options:

  • WanFunInpaintToVideo node docs
  • Wan VACE To Video node docs

Utilities:

  • ComfyUI frame interpolation workflow

Final recommendation

If this were my setup, I would do this:

  1. keep the current Wan 2.2 graph
  2. do not add a classic checkpoint loader to it
  3. build one separate still-frame repair graph
  4. use that graph to fix bad bridge frames
  5. use FLF for the sit-down transition
  6. use frame interpolation instead of brute-forcing native 24 FPS everywhere
  7. only later consider heavier clip-editing or training workflows

That is the simplest, cleanest, least frustrating path from where you are now.

Discussion in the ATmosphere

Loading comments...