{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidqgd5bzrivaeqspo5hncvnp7s5b5vzbfcu5ef3uuzk64rzdinlnu",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3ml3omlbro7o2"
},
"path": "/t/1st-movie-clip/175306#post_14",
"publishedAt": "2026-05-05T05:18:42.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"Wan2.2 Video Generation ComfyUI Official Native Workflow Example",
"LoRA Loader",
"LoraLoaderModelOnly",
"WanFunInpaintToVideo node docs",
"ComfyUI frame interpolation workflow",
"ComfyUI-ReActor",
"ComfyUI Impact Pack",
"ComfyUI IPAdapter Plus",
"ComfyUI Wan FLF workflow",
"ComfyUI Inpainting Workflow",
"Wan VACE To Video node docs"
],
"textContent": "I think the challenge is just too hard… It’s on hard mode from the very start, after all.\n\n* * *\n\n# This is how I would think about your setup now\n\nFirst: switching from the direct Desktop install to **ComfyUI Portable** and suddenly having nodes/Manager behave properly is a real clue, not a coincidence. It strongly suggests the earlier problems were environmental rather than “you not understanding ComfyUI.” That is common with custom-node ecosystems: the install is only truly healthy when the node location and Python environment line up properly.\n\nThe good news is that you are now **past** the hardest beginner wall.\n\nYour current Wan 2.2 setup already does something valuable:\n\n * it generates clips reliably\n * you understand the main nodes\n * you can use positive/negative conditioning\n * you can apply one or more LoRAs\n * you can do first-frame workflows\n * you can do first-frame → last-frame workflows\n\n\n\nThat means the main questions are no longer:\n\n * “How do I make anything at all?”\n * “Why won’t the nodes load?”\n\n\n\nYour real questions now are more advanced and more interesting:\n\n 1. Why does a classic **checkpoint node** not seem to fit the Wan graph?\n 2. Why does **lower FPS** make drift look worse, and what should you do about it?\n 3. Why do ordinary inpainting tutorials not solve **“take this bad frame and fix the face using another face image”**?\n\n\n\nThose three are connected.\n\n* * *\n\n## The short answer\n\nIf I had to compress the whole answer into one paragraph, it would be this:\n\n**Keep your Wan 2.2 workflow as your main shot generator. Do not force a classic SD-style checkpoint loader into the native Wan graph. Treat FPS as a quality/time tradeoff, not as a magic identity fix. Use FLF for the sit-down transition. And for face repair, stop thinking “text-only inpaint” and start thinking “separate still-frame repair workflow using either plain masked face inpaint, ReActor face swap, or mask-local face repair/detailing with a reference-guided method.”**\n\nThat is the cleanest mental model.\n\n* * *\n\n## 1) About the checkpoint node\n\n### Short version\n\nIn a **native Wan 2.2** workflow, you normally do **not** insert a classic SD/SDXL-style checkpoint node.\n\n### Why\n\nThe official Wan 2.2 ComfyUI workflow is not structured like a classic Stable Diffusion workflow where one checkpoint node loads most of the system in one go.\n\nInstead, the official Wan-native flow is built from separate components, typically:\n\n * diffusion model loader\n * CLIP loader\n * VAE loader\n * the Wan video node itself\n * LoRA loader(s)\n * conditioning nodes\n\n\n\nSee:\n\n * Wan2.2 Video Generation ComfyUI Official Native Workflow Example\n\n\n\n### What that means for your graph\n\nIf your current graph already looks something like:\n\n * `Load Diffusion Model`\n * `Load CLIP`\n * `Load VAE`\n * one or more LoRA nodes\n * positive / negative conditioning\n * Wan image-to-video or first/last-frame node\n * decode / save\n\n\n\nthen you are already using the **correct native loading pattern**.\n\nSo the reason you “can’t figure out how to include a checkpoint node” is probably not that you are missing something. It is more likely that **there is no natural slot for a classic checkpoint node in the native Wan graph**.\n\n### Where a checkpoint loader _does_ make sense\n\nA classic checkpoint loader **can** make sense in a **separate still-image repair workflow**.\n\nFor example, if you later build a dedicated face-repair graph using:\n\n * a still-image inpaint model,\n * a checkpoint-based image model,\n * or an SDXL/Flux-style repair branch,\n\n\n\nthen _that_ separate graph may use a checkpoint node.\n\nBut that would be its own repair workflow, not something you must squeeze into the Wan graph itself.\n\n### About your LoRA chain\n\nYour current LoRA logic sounds fine.\n\nRelevant docs:\n\n * LoRA Loader\n * LoraLoaderModelOnly\n\n\n\nImportant points from those docs:\n\n * LoRAs are discovered from `ComfyUI/models/loras`\n * multiple LoRA nodes can be chained directly\n * `LoraLoaderModelOnly` is specifically for applying LoRAs to the **model branch only** , without needing a CLIP model input on that node\n\n\n\nThat is why LoRA chaining feels natural in your current setup, while a classic checkpoint node does not.\n\n### My practical recommendation\n\nFor your Wan graph:\n\n * **do not force a classic checkpoint loader into it**\n * keep the native Wan structure\n * only use checkpoint-based loading in a **separate repair graph** if you later choose a checkpoint-based still-image repair method\n\n\n\n* * *\n\n## 2) About FPS, drift, and render time\n\nYou noticed:\n\n * lower FPS = more visible drift\n * higher FPS = drift feels less noticeable\n * but higher FPS = much longer generation time\n\n\n\nThat observation is useful, and it makes sense.\n\n### Why higher FPS often looks better\n\nHigher FPS does **not necessarily mean the model suddenly understands identity better**.\n\nWhat it often means is:\n\n * each frame is closer to the next in time\n * motion is split into smaller steps\n * the changes between frames feel less abrupt\n * the drift becomes less obvious because the motion is smoother\n\n\n\nSo the model may still be drifting, but the drift is **hidden better** by finer temporal spacing.\n\n### Why this becomes expensive quickly\n\nThe cost scales with frame count.\n\nThe official ComfyUI docs for Wan/Fun Inp make this very explicit: video `length` is the **total number of frames** , and the example calculation is basically:\n\n * `seconds × fps = frame count`\n\n\n\nSo if you double FPS while keeping the duration the same, you roughly double the number of frames the system has to generate.\n\nSee:\n\n * WanFunInpaintToVideo node docs\n * Wan2.2 Video Generation ComfyUI Official Native Workflow Example\n\n\n\n### The important production lesson\n\nOn 8 GB VRAM, I would **not** make native 24 FPS your default unless you truly need it.\n\nThat is because your real bottleneck is not “video exists or not.” It is:\n\n * quality per minute of render time\n * how many iterations you can afford\n * whether you can keep enough control over continuity\n\n\n\n### A better 8 GB strategy\n\nInstead of brute-forcing everything at native 24 FPS, I would bias toward:\n\n 1. **shorter clips**\n 2. **moderate native FPS**\n 3. **frame interpolation later** , when needed\n\n\n\nThe official ComfyUI frame interpolation workflow exists for exactly this reason.\n\nSee:\n\n * ComfyUI frame interpolation workflow\n\n\n\nThat page is very relevant because it explicitly says frame interpolation:\n\n * generates intermediate frames\n * smooths motion\n * improves temporal consistency\n * is useful for increasing frame rate in short clips\n * is useful for fixing low-FPS generations without regenerating the source frames\n\n\n\n### My practical recommendation\n\nFor your current setup I would test this order:\n\n * keep clips short\n * use a sensible native frame count\n * use stronger control (first frame, first→last frame)\n * only then use interpolation for smoother output\n\n\n\nThat is usually a better quality/time tradeoff than forcing 24 FPS generation everywhere.\n\n* * *\n\n## 3) Why the inpainting tutorials feel like they stop one step too early\n\nThis is the part causing the most confusion, and for good reason.\n\n### What those tutorials are really teaching\n\nThe standard inpainting tutorials teach:\n\n * load an image\n * draw a mask\n * use text conditioning\n * regenerate only the masked region\n\n\n\nThat is **generic inpainting**.\n\nAnd yes, that is why:\n\n * teapot example works\n * cloud/hair example works\n * but your actual problem still feels unsolved\n\n\n\nBecause your actual problem is **not** :\n\n> replace this masked region with any plausible thing described by text\n\nYour actual problem is:\n\n> keep this bad frame as the base image, keep the pose/lighting/composition, and make the masked face look like the **correct person from another image**\n\nThat is a different task.\n\n### The missing concept\n\nYou are not supposed to put the second face image “onto the canvas” like another background layer.\n\nInstead:\n\n * the **broken frame** remains the base image\n * the **mask** defines the region to repair\n * the **second face image** enters the graph as a **reference / swap source / identity guide**\n * a repair node uses that second image to influence what happens _inside the mask_\n\n\n\nThat is the key mental shift.\n\n* * *\n\n## 4) So what are the actual ways to use a second face image?\n\nThere are three practical families.\n\n### A. Face swap: the direct route\n\nThis is the ReActor route.\n\nUse it when:\n\n * the frame is already good\n * the face became the wrong person\n * the pose, lighting, clothes, and framing are acceptable\n\n\n\nRelevant repo:\n\n * ComfyUI-ReActor\n\n\n\nWhy it is relevant:\n\n * it is explicitly a face-swap extension for ComfyUI\n * it supports reusable face models\n * it is designed for image inputs and is very naturally suited to “fix this bad frame”\n\n\n\nIn plain language, the workflow is:\n\n * `input_image` = broken frame\n * `source_image` or `face_model` = the correct identity\n * output = repaired frame\n\n\n\nThat is probably the **closest** direct answer to your actual question.\n\n### B. Local face repair/detailing: the practical fallback\n\nThis is the Impact Pack route.\n\nRelevant repo:\n\n * ComfyUI Impact Pack\n\n\n\nImportant nodes:\n\n * `MaskPainter` — draw the mask\n * `FaceDetailer` — detect faces and improve them\n * `MaskDetailer` — inpaint only the masked area with a detailer pass\n\n\n\nWhy it is relevant:\n\n * it matches the “keep the frame, only fix the face” logic very well\n * it is a great fallback if ReActor is awkward or not the right fit\n * it is especially useful if the face is not just the wrong person but also a bit damaged, blurry, or structurally off\n\n\n\n### C. Reference-guided identity repair: the most conceptually accurate route\n\nThis is the IPAdapter FaceID-style idea.\n\nRelevant repo:\n\n * ComfyUI IPAdapter Plus\n\n\n\nWhy it is relevant:\n\n * this is the clearest answer to “how do I use a second image to guide the face repair?”\n * the second face image becomes an identity reference, not just a prompt substitute\n * the docs emphasize that regional use is most effective through an inpainting workflow\n\n\n\nThis route is powerful, but it is more setup-heavy than the other two.\n\n* * *\n\n## 5) My actual recommendation for your case\n\nIf this were my setup, I would not try to solve everything inside one giant graph.\n\nI would deliberately split the work into **two workflows**.\n\n* * *\n\n## Workflow A — the main Wan video workflow\n\nThis is your existing graph.\n\nKeep it for:\n\n * image/text/video generation\n * positive / negative prompt control\n * LoRAs\n * first-frame workflows\n * first-frame → last-frame workflows\n\n\n\nThis is your **shot generator**.\n\nRelevant docs:\n\n * Wan2.2 Video Generation ComfyUI Official Native Workflow Example\n * ComfyUI Wan FLF workflow\n\n\n\n* * *\n\n## Workflow B — the separate still-frame repair workflow\n\nThis is the graph you use when a shot finishes and the last frame is _almost_ right, but the face is not.\n\nUse it for:\n\n * loading the broken frame\n * masking only the face\n * repairing that face with one of:\n * plain inpaint\n * ReActor\n * Impact Pack\n * reference-guided identity repair\n\n\n\nThen save the repaired frame and feed it back into the next Wan shot.\n\nThis is your **continuity repair tool**.\n\nThat split is extremely important.\n\n### Why I recommend two workflows\n\nBecause it gives each graph one clear job:\n\n * **Workflow A** creates shots\n * **Workflow B** repairs bridge frames\n\n\n\nThat is much easier to understand and much easier to debug than an all-in-one “do everything” workflow.\n\n* * *\n\n## 6) Repair vs recreate: the rule that will save you the most time\n\nThis is the rule I would use.\n\n### Repair when:\n\n * the frame is already mostly good\n * the body pose is right\n * the lighting is right\n * the composition is right\n * the background / bench is right\n * only the face or a tiny area drifted\n\n\n\n### Recreate when:\n\n * the pose is wrong\n * the camera is wrong\n * the sit-down motion is wrong\n * multiple frames in a row are bad\n * fixing the face would still leave the shot unusable\n\n\n\nFor your project, that usually means:\n\n * **walk** : repair the last frame if only the face drifted\n * **approach bench** : same\n * **sit-down transition** : usually **recreate** with FLF, not patch frame-by-frame\n * **seated shot** : repair isolated face drift, recreate bad staging\n\n\n\nThis is the production logic I would trust.\n\n* * *\n\n## 7) The exact answer to “what am I doing wrong?”\n\nI do not think you are doing the wrong _operation_.\n\nI think you are trying to solve a **reference-guided identity repair** task with a **text-only generic inpainting tutorial**.\n\nThat is the mismatch.\n\nYou are not failing because you do not understand masking.\n\nYou are failing because your actual task needs one more ingredient:\n\n * a **second-image identity mechanism**\n\n\n\nThat mechanism is usually:\n\n * **ReActor** as a swap source\n * or **IPAdapter FaceID** as a reference-conditioning source\n * or **Impact Pack** as a local face repair route when the face is damaged and needs controlled correction\n\n\n\nThat is the real missing piece.\n\n* * *\n\n## 8) The most practical next steps I would recommend\n\n### Step 1\n\nKeep your current Wan graph as your **main shot workflow**.\n\n### Step 2\n\nStop worrying about forcing a classic checkpoint node into that graph.\n\n### Step 3\n\nCreate a separate **frame repair workflow**.\n\nStart with the official inpaint tutorial just to make sure the mask/local-rebuild logic is completely clear:\n\n * ComfyUI Inpainting Workflow\n\n\n\nThen decide:\n\n * slight face drift → plain masked face inpaint\n * wrong person → ReActor\n * local face repair / fallback → Impact Pack\n * strong second-image identity guidance → IPAdapter FaceID-style route\n\n\n\n### Step 4\n\nUse FLF for your sit-down transition.\n\nThat is a cleaner solution than trying to patch a bad sit-down clip.\n\n### Step 5\n\nDo not default to native 24 FPS everywhere.\nTest:\n\n * short clips\n * moderate native FPS\n * interpolation later\n\n\n\nSee:\n\n * ComfyUI frame interpolation workflow\n\n\n\n* * *\n\n## 9) The one-sentence summary\n\n**Your Wan workflow should generate shots; your separate repair workflow should fix bad bridge frames using a mask plus either a swap source, a face-detail pass, or a second-image identity guide.**\n\nThat is the cleanest way to make the whole thing click.\n\n* * *\n\n## 10) The exact links I would keep open\n\nMain generation:\n\n * Wan2.2 Video Generation ComfyUI Official Native Workflow Example\n * ComfyUI Wan FLF workflow\n\n\n\nRepair / continuity:\n\n * ComfyUI Inpainting Workflow\n * ComfyUI-ReActor\n * ComfyUI Impact Pack\n * ComfyUI IPAdapter Plus\n\n\n\nLater / heavier options:\n\n * WanFunInpaintToVideo node docs\n * Wan VACE To Video node docs\n\n\n\nUtilities:\n\n * ComfyUI frame interpolation workflow\n\n\n\n* * *\n\n## Final recommendation\n\nIf this were my setup, I would do this:\n\n 1. **keep** the current Wan 2.2 graph\n 2. **do not** add a classic checkpoint loader to it\n 3. build one separate **still-frame repair graph**\n 4. use that graph to fix bad **bridge frames**\n 5. use **FLF** for the sit-down transition\n 6. use **frame interpolation** instead of brute-forcing native 24 FPS everywhere\n 7. only later consider heavier clip-editing or training workflows\n\n\n\nThat is the simplest, cleanest, least frustrating path from where you are now.",
"title": "1st movie clip!"
}