Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifchciukhc7a6nem66fagtjdhwwqykewopzkk5ktjfrjctiudfqp4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mljrztyy6h22"
  },
  "path": "/t/wan2-2-i2v-clarifications-needed-regarding-settings-on-low-vram-system/175884#post_9",
  "publishedAt": "2026-05-10T21:03:24.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "The models provided by Liquid (which includes 1.2B or even 350M variant) run just fine locally on a CPU",
    "Phr00t WAN2.2 Rapid All-in-One model card",
    "Rapid AIO README snapshot recommending CFG 1, 4 steps, sa_solver, beta",
    "Wan prompting guide: few-step + CFG 1 means negative prompts mostly do little",
    "ComfyUI-NAG: negative guidance for few-step diffusion models",
    "NAG project page",
    "ChatGPT Free tier FAQ",
    "Gemini API billing / free tier",
    "Gemini API pricing",
    "LiquidAI LFM2.5-1.2B-Instruct-GGUF",
    "LiquidAI llama.cpp deployment guide",
    "llama.cpp server README",
    "Phr00t Rapid AIO README snapshot",
    "How to get the most out of prompts for WAN models",
    "Rapid AIO README snapshot",
    "Wan prompting guide on CFG 1 / negative prompts",
    "ComfyUI-NAG README",
    "Phr00t WAN2.2 Rapid All-in-One",
    "LFM2.5-1.2B-Instruct-GGUF"
  ],
  "textContent": "I’m glad the correct answer was included.\n\nHmm… I think I’ve got a pretty good grasp of the situation now. By the way, distilled models like Lightning tend to struggle with accurately reflecting prompt details—especially negative prompts—but there’s still room for improvement. Their responsiveness to positive prompts is actually quite good. Also, if you’re looking for highly complex prompt responses, I think it’s worth considering other variations (if exist).\n\nDistilled models are often created by retraining a model after drastically pruning it, but in the distilled version, parts that shouldn’t be pruned for your specific purpose might have been removed. Well, I guess it can’t be helped if the goal is to save VRAM… But in any case, this means you also have to consider the performance of the model itself—or rather, the inherent characteristics of the distilled model.\n\nBy the way, if you’ll use an LLM for prompt refinement, I think using the Gemini or ChatGPT API is the easiest way, but if you want to do it entirely locally, an OSS LLM might be better. For this purpose, I think a smaller model from a high-quality OSS model family is perfectly sufficient. The models provided by Liquid (which includes 1.2B or even 350M variant) run just fine locally on a CPU. Other SOTA models like Qwen 3.5 and Gemma 4 in the 4B class or smaller can also run on a CPU alone. A 4B model is a bit heavy for a CPU, but at least these don’t consume VRAM… they run on RAM. Of course, they’d be very faster if with VRAM!\n\n* * *\n\n# Wan2.2 RapidBase I2V on 8GB VRAM: getting more prompt obedience without losing source-image fidelity\n\nAt this point I would **stop chasing the normal High/Low UNet route** for this GPU and use `rapidWAN22I2VGGUF_q4KMRapidBase.gguf` as the main workflow.\n\nThat is not a downgrade. For the actual goal here — **make the source image look like it came to life while preserving the same face, same lighting, same color, same texture, same source quality, and no AI-looking bloom** — this model is doing the right kind of thing. The normal High/Low route may be more flexible in theory, but on an 8GB card it is costing too much source fidelity.\n\nThe new goal should be:\n\n\n    Keep RapidBase.\n    Keep source-image fidelity.\n    Add only mild prompt pressure.\n    Reduce face morphing.\n    Avoid turning the workflow into a repainting/generative workflow.\n\n\nUseful references:\n\n  * Phr00t WAN2.2 Rapid All-in-One model card\n  * Rapid AIO README snapshot recommending CFG 1, 4 steps, sa_solver, beta\n  * Wan prompting guide: few-step + CFG 1 means negative prompts mostly do little\n  * ComfyUI-NAG: negative guidance for few-step diffusion models\n  * NAG project page\n  * ChatGPT Free tier FAQ\n  * Gemini API billing / free tier\n  * Gemini API pricing\n  * LiquidAI LFM2.5-1.2B-Instruct-GGUF\n  * LiquidAI llama.cpp deployment guide\n  * llama.cpp server README\n\n\n\n* * *\n\n## 1. Why RapidBase is the right baseline for this specific goal\n\nThe High/Low UNet experiment was still useful because it proved one thing: the duplicated SD3 shift setup really was causing artifacts. Removing those conflicting shift nodes fixed distortion and improved obedience/face permanence. But the second lesson is more important:\n\n> A technically cleaner High/Low workflow still did not give the desired look.\n\nThe preferred model, `rapidWAN22I2VGGUF_q4KMRapidBase.gguf`, behaves more like a **source-preserving animator** than a full generative video model. That is exactly why it works well for this use case.\n\nIt is good at:\n\n\n    keeping the source image quality\n    keeping low-res screengrabs looking like themselves\n    preserving lighting and colors\n    preserving background\n    avoiding the airbrushed Wan2.2 dream-sequence look\n    making the original picture move\n\n\nIt is weaker at:\n\n\n    complex multi-action prompts\n    large head turns\n    speaking / mouth motion\n    hand gestures\n    strong semantic obedience\n    large expression changes\n    camera moves\n\n\nThat tradeoff is expected. A workflow that preserves the source image 1:1 is not going to be as willing to invent new actions. More obedience usually requires more invention; more invention means more risk of face drift.\n\nSo the right strategy is not:\n\n\n    force the model to obey huge prompts\n\n\nThe right strategy is:\n\n\n    ask for one small action\n    add only mild prompt pressure\n    use seed batching\n    choose outputs by face permanence first\n\n\n* * *\n\n## 2. Current control setup\n\nFrom the screenshot, the current effective workflow is roughly:\n\n\n    Model:\n      rapidWAN22I2VGGUF_q4KMRapidBase.gguf\n\n    VAE:\n      wan_2.1_vae.safetensors\n\n    Text encoder:\n      umt5-xxl-encoder-Q8_0.gguf\n\n    KSampler Advanced:\n      add_noise: enable\n      steps: 10\n      cfg: 1.0\n      sampler_name: sa_solver\n      scheduler: beta\n      start_at_step: 1\n      end_at_step: 10000\n      return_with_leftover_noise: enable\n\n\nSave this as the **control workflow**.\n\nDo not overwrite it. Duplicate it before experiments.\n\nTesting rule:\n\n\n    same image\n    same prompt\n    same seed\n    same frame count\n    same resolution\n    change one setting only\n\n\nIf you change CFG, steps, start step, sampler, and prompt at the same time, the result becomes impossible to interpret.\n\n* * *\n\n## 3. Why CFG should stay low\n\nThe Rapid/AIO family is explicitly described as a fast all-in-one merge designed around **few steps** and **CFG 1**. One README snapshot recommends:\n\n\n    4 steps\n    1 cfg\n    sa_solver sampler\n    beta scheduler\n\n\nSource: Phr00t Rapid AIO README snapshot\n\nThat does not mean the exact best value for your workflow must be exactly 4 steps. Your screenshot already works at 10 steps. But it does mean this model should be tuned like a **few-step distilled / rapid model** , not like a normal 20-30 step diffusion workflow.\n\nDo not jump to:\n\n\n    cfg: 3.0\n    cfg: 4.0\n    cfg: 5.0\n\n\nThat is likely to cause:\n\n\n    face drift\n    new skin texture\n    bloom\n    over-smoothing\n    changed lighting\n    new expression\n    hallucinated details\n\n\nUse a micro-range instead.\n\n* * *\n\n## 4. CFG test range\n\nCurrent baseline:\n\n\n    cfg: 1.0\n\n\nRecommended test values:\n\n\n    1.00\n    1.15\n    1.25\n    1.35\n    1.50\n\n\nInterpretation:\n\nCFG | Expected behavior\n---|---\n`1.00` | maximum source fidelity, weakest negative-prompt effect\n`1.15` | tiny prompt pressure\n`1.25` | likely first useful obedience bump\n`1.35` | upper mild test\n`1.50` | stress test for face drift\n`2.00+` | probably too much if face permanence matters\n\nThe likely useful zone is:\n\n\n    cfg: 1.15-1.35\n\n\nRule:\n\n> Use the highest CFG that does not change the face.\n\nTest like this:\n\n\n    Run A:\n      cfg: 1.00\n\n    Run B:\n      cfg: 1.15\n\n    Run C:\n      cfg: 1.25\n\n    Run D:\n      cfg: 1.35\n\n    Run E:\n      cfg: 1.50\n\n\nKeep everything else identical.\n\nJudge in this order:\n\n\n    1. same face / same identity\n    2. same source-image quality\n    3. no morphing\n    4. no artifacts\n    5. prompt obedience\n    6. natural motion\n\n\nPrompt obedience is not the first priority. A clip that obeys perfectly but changes the face is a failed clip for this workflow.\n\n* * *\n\n## 5. Negative prompts are weak at CFG 1\n\nA common trap is adding a giant negative prompt and expecting it to control the output. In many few-step Wan/Rapid/Lightning-style workflows, **CFG 1 means negative prompts are weak or mostly inactive**.\n\nThe Wan prompting guide explains this directly: in standard diffusion, CFG above 1 gives the model a stronger positive-vs-negative comparison, but in few-step CFG 1 workflows, negative prompts often do little. See How to get the most out of prompts for WAN models.\n\nPractical consequence:\n\n\n    Do not rely on a huge negative prompt.\n    Put the important preservation rules in the positive prompt.\n\n\nPositive prompt should explicitly say:\n\n\n    same face\n    same identity\n    same hairstyle\n    same clothing\n    same lighting\n    same colors\n    same camera angle\n    same background\n    static camera\n    no zoom\n    no scene change\n    only subtle motion\n\n\nA short negative prompt is still fine, but it is secondary.\n\n* * *\n\n## 6. `start_at_step`: test 1 vs 0\n\nCurrent screenshot:\n\n\n    start_at_step: 1\n\n\nThis may be helping source fidelity. Starting at step 1 can skip a tiny early part of the denoising path, which may reduce repainting.\n\nTest only:\n\n\n    start_at_step: 1\n    start_at_step: 0\n\n\nExpected tradeoff:\n\nSetting | Likely benefit | Risk\n---|---|---\n`1` | better source fidelity and face permanence | weaker motion / weaker prompt response\n`0` | more motion and prompt response | more face drift / more repainting\n\nSuggested test:\n\n\n    Run A:\n      cfg: 1.25\n      start_at_step: 1\n      steps: 10\n\n    Run B:\n      cfg: 1.25\n      start_at_step: 0\n      steps: 10\n\n\nPossible decisions:\n\nResult | Keep\n---|---\n`0` improves obedience and face stays stable | `start_at_step: 0`\n`0` gives more motion but face changes | `start_at_step: 1`\nno meaningful difference | `start_at_step: 1`\n`0` adds bloom/repainting | `start_at_step: 1`\n\nMy expectation: `start_at_step: 1` may remain the safest default.\n\n* * *\n\n## 7. Steps: test 8 / 10 / 12\n\nCurrent setting:\n\n\n    steps: 10\n\n\nThis may already be close to the sweet spot.\n\nFew-step distilled models do not always improve with more steps. Sometimes extra steps create more smoothing, blending, or repainting.\n\nTest only:\n\n\n    steps: 8\n    steps: 10\n    steps: 12\n\n\nExpected behavior:\n\nSteps | Likely behavior\n---|---\n`8` | faster, possibly more source-faithful, possibly weaker obedience\n`10` | current working baseline\n`12` | may improve smoothness/obedience, but may add bloom or airbrushing\n`16+` | not recommended for this model unless intentionally stress-testing\n\nSuggested test:\n\n\n    Run A:\n      steps: 8\n\n    Run B:\n      steps: 10\n\n    Run C:\n      steps: 12\n\n\nKeep the best balance. If 12 adds the “dream sequence” look, go back to 10.\n\n* * *\n\n## 8. `return_with_leftover_noise`: test once\n\nCurrent screenshot:\n\n\n    return_with_leftover_noise: enable\n    end_at_step: 10000\n\n\nSince `end_at_step` is far beyond the actual step count, the sampler is probably completing its pass. This setting may not matter much, but test it once.\n\n\n    Run A:\n      return_with_leftover_noise: enable\n\n    Run B:\n      return_with_leftover_noise: disable\n\n\nKeep whichever preserves the “picture came to life” look.\n\nDo not spend a whole day on this. It is unlikely to be the main obedience or face-permanence control.\n\n* * *\n\n## 9. `add_noise`: keep enabled\n\nKeep:\n\n\n    add_noise: enable\n\n\nFor image-to-video, the model needs noise to create motion. If you disable it, you may get a more frozen output or odd behavior depending on the rest of the graph.\n\nOnly test `add_noise: disable` if diagnosing a very specific problem:\n\n\n    every seed changes the face\n    motion is always too aggressive\n    the image is being repainted too much\n\n\nEven then, treat it as a diagnostic test, not the likely final setting.\n\n* * *\n\n## 10. Sampler and scheduler: keep `sa_solver / beta`\n\nYour current best branch uses:\n\n\n    sampler_name: sa_solver\n    scheduler: beta\n\n\nKeep that as the main branch.\n\nThe Rapid/AIO README snapshot specifically recommends `sa_solver` and `beta` for that family. Source: Rapid AIO README snapshot.\n\nIf you want to test alternatives, do it only after the CFG/start/steps tests, and keep them as separate branches:\n\n\n    Branch A:\n      sa_solver / beta\n\n    Branch B:\n      euler / beta\n\n    Branch C:\n      euler_a / beta\n\n    Branch D:\n      euler / simple\n\n\nExpected behavior:\n\nSampler / scheduler | Likely behavior\n---|---\n`sa_solver / beta` | best current source-fidelity branch\n`euler / beta` | may obey differently, possibly less faithful\n`euler_a / beta` | more variation/motion, higher face-drift risk\n`euler / simple` | more relevant to Lightning/LightX2V-style workflows\n\nI would not change sampler/scheduler unless the smaller tests fail.\n\n* * *\n\n## 11. Seed batching is now one of the strongest tools\n\nYou already noticed face morphing is seed-dependent. That is real.\n\nIn video generation, the seed affects:\n\n\n    eye behavior\n    mouth behavior\n    micro-expression\n    small head motion\n    whether face identity drifts\n    whether the source texture holds\n\n\nUse two phases.\n\n### Phase A — setting tests\n\nUse one fixed seed:\n\n\n    fixed seed\n    same image\n    same prompt\n    same resolution\n    same frame count\n    change one setting only\n\n\nThis tells you what the setting does.\n\n### Phase B — production seed search\n\nAfter choosing settings, run:\n\n\n    8-16 seeds\n    same image\n    same prompt\n    same final settings\n    short preview first\n\n\nPick by this priority:\n\n\n    1. same face / same identity\n    2. same source-image quality\n    3. no morphing\n    4. natural motion\n    5. prompt obedience\n    6. no artifacts\n\n\nFor your goal, a seed that keeps the face and obeys 70% is better than a seed that obeys 100% and changes the person.\n\n* * *\n\n## 12. Exact tuning plan\n\n### Matrix 0 — save control\n\n\n    Model:\n      rapidWAN22I2VGGUF_q4KMRapidBase.gguf\n\n    VAE:\n      wan_2.1_vae.safetensors\n\n    Text encoder:\n      umt5-xxl-encoder-Q8_0.gguf\n\n    KSampler Advanced:\n      add_noise: enable\n      steps: 10\n      cfg: 1.0\n      sampler_name: sa_solver\n      scheduler: beta\n      start_at_step: 1\n      end_at_step: 10000\n      return_with_leftover_noise: enable\n\n\nSave this output as the reference.\n\n### Matrix 1 — CFG\n\n\n    cfg: 1.00\n    cfg: 1.15\n    cfg: 1.25\n    cfg: 1.35\n    cfg: 1.50\n\n\nPick the highest CFG that does not alter identity.\n\n### Matrix 2 — start step\n\nUse the best CFG.\n\n\n    start_at_step: 1\n    start_at_step: 0\n\n\nKeep `1` unless `0` clearly improves obedience without face drift.\n\n### Matrix 3 — steps\n\nUse best CFG and best start step.\n\n\n    steps: 8\n    steps: 10\n    steps: 12\n\n\nKeep the one with the least bloom/airbrushing and best face permanence.\n\n### Matrix 4 — leftover noise\n\nUse best CFG/start/steps.\n\n\n    return_with_leftover_noise: enable\n    return_with_leftover_noise: disable\n\n\nKeep the more source-faithful result.\n\n### Matrix 5 — seed batch\n\nUse final settings.\n\n\n    8-16 seeds\n    short preview\n    same prompt\n    same image\n\n\nPick the seed by face permanence first.\n\n* * *\n\n## 13. Recommended presets\n\n### Preset A — safest source fidelity\n\nUse when the face must stay the same.\n\n\n    Model:\n      rapidWAN22I2VGGUF_q4KMRapidBase.gguf\n\n    VAE:\n      wan_2.1_vae.safetensors\n\n    Text encoder:\n      umt5-xxl-encoder-Q8_0.gguf\n\n    KSampler Advanced:\n      add_noise: enable\n      steps: 10\n      cfg: 1.0\n      sampler_name: sa_solver\n      scheduler: beta\n      start_at_step: 1\n      end_at_step: 10000\n      return_with_leftover_noise: enable\n\n\nUse for:\n\n\n    portraits\n    faces\n    low-res screengrabs\n    source-quality preservation\n    subtle motion\n\n\n### Preset B — slightly more obedient\n\n\n    Same as Preset A, except:\n\n    cfg: 1.15\n\n\nThen test:\n\n\n    cfg: 1.25\n\n\nStop if the face changes.\n\n### Preset C — stronger motion test\n\n\n    Same as Preset A, except:\n\n    start_at_step: 0\n    cfg: 1.15\n\n\nIf the face changes, return to:\n\n\n    start_at_step: 1\n\n\n### Preset D — smoothness test\n\n\n    Same as Preset A, except:\n\n    steps: 12\n\n\nIf it adds bloom or airbrushing, return to:\n\n\n    steps: 10\n\n\n### Preset E — faster seed scouting\n\n\n    Same as Preset A, except:\n\n    steps: 8\n    shorter frame count\n    lower test resolution\n\n\nUse this only for finding seeds quickly, then rerun good seeds at normal settings.\n\n* * *\n\n## 14. Prompt strategy: one action only\n\nThis workflow needs simple prompts.\n\nBad prompt:\n\n\n    The person turns their head, smiles, raises their hand, looks into the camera, hair moves in the wind, camera slowly zooms in, cinematic lighting.\n\n\nWhy this is bad:\n\n\n    too many actions\n    requires new expression\n    requires new pose\n    requires new hair behavior\n    requires camera motion\n    invites lighting changes\n    increases face drift\n\n\nBetter prompt:\n\n\n    The same person from the source image gently blinks once. Preserve the exact same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No zoom. No scene change.\n\n\nBest rule:\n\n\n    one generation = one small action\n\n\nSafe actions:\n\n\n    one subtle blink\n    gentle breathing\n    tiny natural smile\n    slight eye movement\n    very small head tilt\n\n\nRisky actions:\n\n\n    speaking\n    laughing widely\n    turning head far\n    walking\n    dancing\n    raising hands\n    hair blowing strongly\n    camera zoom\n    camera orbit\n    lighting change\n\n\nFor this workflow, obedience improves when the requested action is simple enough that the model does not need to repaint the person.\n\n* * *\n\n## 15. Positive prompt templates\n\nSince negative prompts are weak at CFG 1, put preservation constraints in the positive prompt.\n\n### Safe source-faithful template\n\n\n    The same person from the source image gently blinks once. Preserve the exact same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No zoom. No pan. No scene change. Natural subtle motion. Sharp face.\n\n\n### Slightly more expressive template\n\n\n    The same person from the source image makes a tiny natural smile while gently breathing. Preserve the exact same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No zoom. No scene change.\n\n\n### Minimal template\n\n\n    Same person, same face, same identity, same lighting and background. One subtle blink. Static camera.\n\n\n### Face permanence template\n\n\n    The same person keeps the exact same face and identity throughout the video. Only subtle natural breathing and one small blink. Same hairstyle, clothing, lighting, colors, camera angle, and background. Static camera.\n\n\nThe repetition of “same face” and “same identity” is not elegant, but it is useful conditioning.\n\n* * *\n\n## 16. Negative prompt template\n\nKeep it short.\n\n\n    different person, face change, identity change, warped face, distorted eyes, changing hairstyle, changing clothes, changing background, camera movement, zoom, scene change, blurry face\n\n\nOptional additions:\n\n\n    extra teeth, melted face, asymmetrical eyes, over-smoothed skin, airbrushed, bloom\n\n\nDo not spend all your effort on negative prompting. At CFG 1, it may do very little. At CFG 1.15-1.35, it may help slightly, but positive prompt structure and seed selection matter more.\n\nReference: Wan prompting guide on CFG 1 / negative prompts\n\n* * *\n\n## 17. Handling complex prompts\n\nThe model refuses or hallucinates complex prompts because they ask for too many inventions at once.\n\nA complex prompt often includes:\n\n\n    subject action\n    facial expression\n    body motion\n    camera motion\n    lighting change\n    background interpretation\n    style direction\n\n\nThat is too much for a source-faithful RapidBase workflow.\n\nInstead of:\n\n\n    She turns to the camera, smiles, raises her hand, and the camera slowly zooms in.\n\n\nUse separate clips:\n\n\n    Clip 1:\n      same person gently blinks once\n\n    Clip 2:\n      same person makes a tiny natural smile\n\n    Clip 3:\n      same person slightly raises one hand, only if the hand is already visible\n\n\nDo not ask for a hand raise if the hand is not clearly visible in the source image. If the model must invent a hand, it may also invent a new body or face.\n\n* * *\n\n## 18. Face permanence rules\n\nFace permanence is mostly controlled by:\n\n\n    source image clarity\n    motion size\n    CFG\n    start_at_step\n    seed\n    prompt complexity\n    frame count\n    camera motion\n\n\nDo:\n\n\n    use clear face images\n    keep motion small\n    use static camera\n    use one action only\n    keep CFG low\n    batch seeds\n    choose face permanence first\n\n\nAvoid:\n\n\n    large head turns\n    speaking\n    wide smiles\n    looking away then back\n    hands crossing the face\n    camera movement\n    dramatic emotion\n    lighting changes\n    long clips before seed selection\n\n\nThe model is most likely to morph the face when asked for mouth/teeth motion, big expression changes, or head rotation. Blinks and breathing are much safer.\n\n* * *\n\n## 19. Should you add nodes?\n\nMain recommendation:\n\n\n    Add almost nothing.\n\n\nYour current workflow’s value is that it does **not** repaint too much. Extra nodes can easily destroy that.\n\nAvoid adding during optimization:\n\n\n    face restore\n    style LoRAs\n    multiple LoRAs\n    high-strength LoRAs\n    upscalers before judging motion\n    interpolation before judging motion\n    color correction before judging model behavior\n\n\nUpscale/interpolation should happen only after you choose:\n\n\n    prompt\n    seed\n    settings\n    motion\n    face permanence\n\n\n* * *\n\n## 20. Optional node: NAG\n\nNAG is the one optional control idea that fits the problem.\n\nWhy it may help:\n\n\n    the model runs near CFG 1\n    negative prompts are weak\n    raising CFG can morph the face\n    NAG may add negative-prompt-like control without pushing CFG too hard\n\n\nThe ComfyUI-NAG README says NAG restores effective negative prompting in few-step diffusion models and can complement CFG. The NAG project page similarly describes NAG as a method for restoring negative prompting in few-step sampling.\n\nHow to test:\n\n\n    copy the workflow\n    add NAG only in the copy\n    keep CFG low\n    use the same seed and prompt\n    compare against the saved control\n\n\nRemove it if it causes:\n\n\n    bloom\n    airbrushing\n    texture changes\n    face drift\n    loss of source quality\n\n\nDo not make NAG part of the main workflow until it beats the control.\n\n* * *\n\n## 21. LoRAs: only one, only low strength\n\nThe Phr00t Rapid/AIO model card notes Wan 2.1 LoRA compatibility and low-noise Wan 2.2 LoRA compatibility, but warns against high-noise Wan 2.2 LoRAs for that family. See Phr00t WAN2.2 Rapid All-in-One.\n\nIf testing LoRAs:\n\n\n    one LoRA only\n    strength 0.15\n    strength 0.25\n    strength 0.35\n\n\nAvoid:\n\n\n    1.0 strength\n    multiple LoRAs\n    style LoRAs\n    high-noise Wan2.2 LoRAs\n    character LoRAs unless necessary\n\n\nFor this workflow, LoRAs are more likely to hurt source fidelity than help, unless very targeted.\n\n* * *\n\n## 22. Free prompt restructuring resources\n\nDo not run Ollama or a local LLM on the same GPU while using ComfyUI. On an 8GB card, that competes directly with Wan.\n\nUse web tools or CPU-only local tools.\n\n### Free web options\n\nGood enough:\n\n\n    ChatGPT Free\n    Google AI Studio / Gemini\n\n\nReferences:\n\n  * ChatGPT Free tier FAQ\n  * Gemini API billing / free tier\n  * Gemini API pricing\n\n\n\nUse one batched request rather than many small requests.\n\n* * *\n\n## 23. Prompt rewriter request template\n\nPaste this into ChatGPT, Gemini, or a local helper.\n\n\n    Rewrite this as a short Wan2.2 image-to-video prompt for a low-VRAM RapidBase workflow.\n\n    Rules:\n    - one small action only\n    - preserve exact face and identity\n    - preserve hairstyle, clothing, lighting, colors, camera angle, and background\n    - static camera\n    - no zoom\n    - no pan\n    - no scene change\n    - avoid cinematic embellishment\n    - avoid new details not visible in the source image\n    - keep it literal and short\n    - output exactly 3 versions:\n      1. safest source-faithful version\n      2. slightly more expressive version\n      3. shortest version\n\n    Original idea:\n    <put idea here>\n\n\nIn normal prose, refer to the placeholder as <put idea here>. Inside code blocks, use raw `<put idea here>`.\n\nThis is better than asking “make the prompt better,” because “better” usually means more cinematic, more detailed, and more inventive — exactly what you do not want.\n\n* * *\n\n## 24. CPU-only local prompt helper\n\nA local helper is optional.\n\nGoal:\n\n\n    rewrite prompts\n    do not use GPU VRAM\n    do not compete with ComfyUI\n\n\nA good tiny local option is LFM2.5-1.2B-Instruct-GGUF. LiquidAI’s docs explain that LFM models are available in GGUF format for llama.cpp-style use: LiquidAI llama.cpp deployment guide.\n\nExample CPU-only server command:\n\n\n    llama-server \\\n      -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF:Q4_K_M \\\n      -c 2048 \\\n      -ngl 0 \\\n      --host 127.0.0.1 \\\n      --port 8080\n\n\nImportant part:\n\n\n    -ngl 0\n\n\nThe `llama.cpp` server docs expose GPU layer offload settings through `ngl` / GPU layer options; setting GPU layers to zero is the relevant CPU-only principle. See llama.cpp server README.\n\nRecommended order:\n\n\n    1. ChatGPT Free or Gemini\n    2. LFM2.5-1.2B Q4_K_M CPU-only\n    3. Qwen 2B-4B CPU-only if you want smarter rewriting\n    4. larger local models only if you have spare CPU/RAM\n\n\n* * *\n\n## 25. Prompt helper system prompt\n\nUse this as the system prompt in ChatGPT, Gemini, LFM, Qwen, or any prompt helper.\n\n\n    You are a prompt rewriting assistant for Wan2.2 image-to-video.\n\n    Rewrite the user's idea into a short, literal, source-faithful I2V prompt.\n\n    Rules:\n    - Use one small action only.\n    - Preserve the exact same face and identity.\n    - Preserve hairstyle, clothing, lighting, colors, camera angle, and background.\n    - Keep the camera static.\n    - No zoom.\n    - No pan.\n    - No scene change.\n    - No cinematic embellishment.\n    - No new objects.\n    - Avoid talking, dancing, walking, large head turns, and large expression changes.\n    - Prefer subtle motion: blink, gentle breathing, tiny smile, very small eye movement.\n\n    Output exactly:\n    1. Safest:\n    2. Slightly more expressive:\n    3. Shortest:\n\n    Do not explain.\n\n\nThen give it:\n\n\n    Rewrite this idea for Wan2.2 I2V:\n\n    <your idea>\n\n\nExample input:\n\n\n    make her look at the camera and smile a bit, maybe some hair movement\n\n\nExpected output style:\n\n\n    1. Safest:\n    The same person from the source image gently blinks once and makes a tiny natural smile. Preserve the exact same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Static camera. No zoom. No scene change.\n\n    2. Slightly more expressive:\n    The same person from the source image looks naturally toward the camera and makes a very small smile. Preserve the same face, identity, hairstyle, clothing, lighting, colors, camera angle, and background. Only subtle natural motion. Static camera.\n\n    3. Shortest:\n    Same person, same face and identity. One subtle blink and tiny smile. Static camera. Same lighting and background.\n\n\n* * *\n\n## 26. What I would do next\n\n  1. Keep `rapidWAN22I2VGGUF_q4KMRapidBase.gguf` as the main branch.\n  2. Save the current workflow as the control.\n  3. Test CFG `1.00 / 1.15 / 1.25 / 1.35 / 1.50`.\n  4. Test `start_at_step: 1` vs `0`.\n  5. Test `steps: 8 / 10 / 12`.\n  6. Test `return_with_leftover_noise: enable` vs `disable` once.\n  7. Use seed batches after choosing settings.\n  8. Use one-action prompts.\n  9. Put preservation constraints in the positive prompt.\n  10. Try NAG only in a duplicate workflow if negative prompting remains weak.\n  11. Use ChatGPT/Gemini or CPU-only LFM2.5 for prompt rewriting, not a GPU LLM inside ComfyUI.\n\n\n\n* * *\n\n## Short summary\n\n  * Keep `rapidWAN22I2VGGUF_q4KMRapidBase.gguf`; it matches the source-fidelity goal.\n  * Keep `sa_solver / beta` as the main branch.\n  * Do not chase CFG 3+.\n  * Test CFG only in a tiny range: `1.00 / 1.15 / 1.25 / 1.35 / 1.50`.\n  * Test `start_at_step: 1` versus `0`.\n  * Test `steps: 8 / 10 / 12`.\n  * Use seed batches; face permanence is seed-sensitive.\n  * At CFG 1, negative prompts are weak. Put identity/background/camera constraints in the positive prompt.\n  * Use one small action per prompt.\n  * Add almost nothing to the workflow. NAG is the only optional control node worth testing, and only in a copy.\n  * For prompt rewriting, use ChatGPT Free, Gemini/AI Studio, or a CPU-only tiny model like LFM2.5.\n\n",
  "title": "Wan2.2 i2v (clarifications needed regarding settings on low vram system)"
}