Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidlee47edxldchzkwbvpscasqhwry2wccuh2yxoknmzy5d37hlcie",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mfz4ergdkmy2"
  },
  "path": "/t/about-traning-lora-for-z-image-turbo/173911#post_2",
  "publishedAt": "2026-03-01T08:48:54.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Hugging Face",
    "GitHub",
    "arXiv"
  ],
  "textContent": "When using it as a base model for LoRA training, Base might be fundamentally more suitable than Turbo? But Turbo came out first and is more widespread… making the choice difficult.\n\n* * *\n\n## Background you need for this decision\n\nA LoRA does not “fix” a model in the abstract. It learns to reproduce the **distribution** of whatever you train it on (subject, lighting, palette, background types, contrast curve, etc.). This is why dataset choice matters more than most hyperparameters.\n\nZ-Image **Turbo** adds a second complication: it is a **few-step distilled** model that **does not rely on classifier-free guidance (CFG)** at inference and, in the official prompting guidance, **does not use negative prompts at all**. (Hugging Face) Turbo is typically run at ~9 steps with guidance scale 0.0.\nThat pushes more of the “look” (including background/color tese model prior, and makes classic “negative prompt fixes” weaker unless you use something like NAG.\n\n* * *\n\n## 1) Is the base output recommended for LoRA training?\n\n### It depends what your LoRA is for\n\n#### If your LoRA is primarily **identity / subject / concept**\n\nUsing images that are **clean, consistent, and not heavily “styled”** is often the most robust path.\n\n  * You want the LoRA to learn _what the subject is_ , not “a specific color grade.”\n  * You can keep background/color control as an inference-time knob.\n\n\n\n**In this case, training on “base-ish” outputs can be acceptable** , because you are not trying to reprogram the global aesthetic—just teach a concept.\n\n#### If your LoRA is primarily **look / palette / background behavior**\n\nThen training on base-like images is _not_ recommended for your goal, because:\n\n  * The LoRA will learn the base model’s palette/background priors **as part of the target distribution**.\n  * You may end up fighting the LoRA with NAG/Detail Daemon every time.\n\n\n\n**For a look-changing objective, your training images should already reflect your desired look.**\n\n* * *\n\n## 2) Should you train with the most optimal images for your goals?\n\n### For your specific complaint (background + colors): generally yes\n\nIf you want different backgrounds and color behavior, training on your “optimal” images is the direct route.\n\nBut “optimal” should mean:\n\n  * **Your intended palette/contrast/white balance**\n  * **Not overprocessed** (avoid extreme sharpening / HDR micro-contrast / aggressive artifact removal)\n\n\n\nWhy the caution: Detail Daemon explicitly notes that pushing it too far produces an **oversharpened and/or HDR effect**. (GitHub) If you train on images that already have that signature and then also apply Detail Daemon at inference, you can get “double enhancement.”\n\n* * *\n\n## How the two approaches affect results\n\n### Approach A — Train on base-like images (neutral training set)\n\n**What improves**\n\n  * Better **generalization** across prompts and scenes.\n  * Your LoRA is less likely to “force” one palette/background everywhere.\n  * NAG and other inference controls remain clean “steering knobs.”\n\n\n\n**What stays the same**\n\n  * The model’s default background/color tendencies will still show up unless you steer them at inference (NAG, prompt strategy, post).\n\n\n\n**Typical failure mode**\n\n  * You finish training and feel: “My LoRA works, but the colors/background are still wrong.”\n\n\n\n### Approach B — Train on “final-look” images (your optimal goal images)\n\n**What improves**\n\n  * Background and palette shift **by default** , with less per-prompt tweaking.\n  * More consistent “house style.”\n\n\n\n**What gets worse**\n\n  * Less flexibility: the LoRA can “drag” unrelated prompts toward your dataset’s look.\n  * Higher risk of learning workflow artifacts (over-sharpen, haloing, crunchy texture, too-clean backgrounds).\n\n\n\n**Typical failure mode**\n\n  * “Everything looks like my training set, even when I don’t want it to.”\n\n\n\n### A practical compromise that works well\n\nTrain on images that are **80–90%** your target look (good grading, but not extreme), then do the last 10–20% with NAG/Detail Daemon.\n\n* * *\n\n## Where NAG and Detail Daemon fit\n\n### NAG (Normalized Attention Guidance)\n\nNAG is explicitly motivated by the problem that **CFG-based negative guidance collapses in few-step regimes** , and it restores effective negative-style control by operating in attention space. (arXiv)\n\n**Implication for you**\n\n  * If you keep training neutral (Approach A), NAG is a strong way to suppress unwanted background/color traits at inference without baking them into the LoRA.\n  * If you bake the look into the LoRA (Approach B), you typically need **less** NAG (otherwise you can over-suppress and lose richness).\n\n\n\n### Detail Daemon\n\nDetail Daemon adjusts sigma/sampling behavior to enhance detail and can reduce unwanted blur, but can produce an **oversharpen/HDR** look if pushed. (GitHub)\n\n**Implication for you**\n\n  * Use it as a **finisher** , not as the “source of truth” for what your LoRA should learn.\n  * If you want to train on images processed with it, keep settings **moderate and consistent**.\n\n\n\n* * *\n\n## Turbo vs Base as the training base model\n\n### Why Z-Image (Base) is generally better for LoRA training\n\nThe official Z-Image repo recommends for the Base model:\n\n  * guidance scale **3.0–5.0**\n  * inference steps **28–50**\n  * negative prompts **strongly recommended** (GitHub)\n\n\n\nThat is exactly the environment where you can:\n\n  * control background/colors at inference (CFG + negatives)\n  * train a LoRA without depending on special “distillation-preserving” tricks\n\n\n\n### Why Z-Image Turbo is riskier to train on directly\n\nTurbo is distilled for fast steps and “no CFG/negative prompts.” (Hugging Face)\nA commonly reported issue is that **LoRAs trained directly on Turbo can lose the acceleration behavior** —images become blurry at “fast” settings while looking normal at slower settings. (Hugging Face)\n\nIf you must train on Turbo anyway, there are two established mitigation paths:\n\n  1. **Training adapter / de-distillation approach** (Ostris adapter) (Hugging Face)\n  2. **DistillPatch LoRA** to restore fast-step behavior after Turbo LoRA training (Hugging Face)\n\n\n\n* * *\n\n## What I would do for your purpose\n\n### If your priority is “fix background + colors” and you want stability\n\n**Train on Z-Image (Base)** with a dataset that reflects your desired palette/background, then:\n\n  * generate on Base when you want maximum control/quality\n  * optionally test the same LoRA on Turbo for speed\n\n\n\nThis leverages Base’s controllability and avoids Turbo’s distillation fragility. (GitHub)\n\n### If your priority is “must run on Turbo at 8–9 steps”\n\nTrain for Turbo, but plan the pipeline up front:\n\n  1. Train with a **Turbo training adapter** (Hugging Face)\n  2. Evaluate every checkpoint at **Turbo inference settings** (few steps, guidance off) (Hugging Face)\n  3. If fast-step quality degrades, apply **DistillPatch** (Hugging Face)\n  4. Keep your training images “goal-like but not overcooked” to avoid learning artifacts (GitHub)\n\n\n\n* * *\n\n## Concrete “better approach” patterns (high leverage)\n\n### Pattern 1: Separate “subject LoRA” and “look LoRA”\n\n  * Train **subject/identity** on clean images (neutral look).\n  * Train **look/palette/background** on curated goal images.\n  * Use lower strengths and avoid stacking too many LoRAs on Turbo (Turbo is more sensitive). (Hugging Face)\n\n\n\n### Pattern 2: Make the look toggleable with a trigger token\n\nIf you want the palette/background sometimes but not always:\n\n  * Caption the look with a distinct token (e.g., `my_grade_style`) and keep other captions content-focused.\n  * Then you can turn the look on/off per prompt.\n\n\n\n### Pattern 3: Don’t “teach” Detail Daemon artifacts\n\nUse Detail Daemon primarily at inference; if you must use it in dataset creation, keep it mild and consistent to avoid baking in HDR/sharpen artifacts. (GitHub)\n\n* * *\n\n## Summary answers to your questions\n\n### 1) Is the base output recommended for LoRA training?\n\n  * **Only** if your LoRA is for subject/concept and you want maximum generality.\n  * If your complaint is specifically **background + colors** , base-like images will tend to **preserve** the base look.\n\n\n\n### 2) Should you train with the most optimal images for your goals?\n\n  * **Yes** , if your goal is to change palette/background behavior.\n  * Use “optimal” as “desired look, but not overprocessed,” to avoid training in artifacts (especially from aggressive detail enhancement). (GitHub)\n\n\n\n### Turbo vs Base for your purpose?\n\n  * For reliable look control: **train on Z-Image (Base)** (better CFG/negative/steps control). (GitHub)\n  * For Turbo-at-8-steps delivery: train with a **Turbo training adapter** and expect to use **DistillPatch** if acceleration quality breaks. (Hugging Face)\n\n",
  "title": "About traning LoRa for Z Image Turbo"
}