External Publication
Visit Post

Flux LORA won't show my legs or feet

Hugging Face Forums [Unofficial] June 4, 2026
Source

Focusing primarily on ways to address issues raised in the thread, I collected a bunch of things that might be useful:


TL;DR

I would not treat this as one single “FLUX cannot draw legs” bug.

It looks more like several different failure modes overlapping:

Failure mode What it looks like What I would try first
Framing failure legs/feet are missing, cropped, or never enter the frame stronger full-body framing prompt, vertical aspect ratio, camera distance, visible floor/shoes
Subject-token entanglement the LoRA keeps pulling the subject back into close-up / upper-body shots lower LoRA strength, better captions, more varied shot distances
Identity-at-distance failure full body appears, but the face stops looking like the person add full-body identity examples and bridge shots, not only face closeups
Caption entanglement outfit/background/crop type sticks to the person caption clothes, background, camera distance, shot type
Human anatomy failure legs appear, but are warped/deformed more clean lower-body examples, pose/control/inpaint; do not expect prompt alone to solve every case
Overtraining / prompt-following loss likeness improves but prompt flexibility collapses test intermediate checkpoints and LoRA weights instead of assuming “more is better”

So my practical mental model would be:

Separate identity from framing , clothing , background , camera distance , and feet/ground contact.

The goal is not necessarily to retrain FLUX itself. The practical layer is usually: prompt framing → LoRA strength → captions → dataset balance → LoRA retraining → pose/control/inpaint if needed.


1. Why this can happen

A subject LoRA may learn more than “who this person is.”

If most examples are close-up or mid-shot, the trigger token can also absorb things like:

  • close-up framing
  • upper-body crop
  • missing lower body
  • repeated outfit
  • repeated background
  • repeated lighting
  • camera distance
  • “this person usually appears without visible feet”

This is not unique to FLUX. Text-to-image models can make very plausible images while still being brittle about composition: which requested elements appear, where they appear, and which attributes belong to which object/body part. Benchmarks like T2I-CompBench, T2I-CompBench++, and GenEval are basically built around this kind of compositional weakness.

There is also a known failure mode where text-to-image diffusion models simply do not generate some requested concepts. Attend-and-Excite discusses this as catastrophic neglect , where one or more subjects/concepts in the prompt are not generated. That maps pretty well to “I said feet visible, but the model ignored the feet.”

Separately, human limbs are still a hard case for T2I models. Papers such as HumanRefiner and Distortion-5K / ViT-HD specifically discuss distorted limbs, missing fingers, deformed extremities, fused body parts, and other human-body distortions in generated images.

So I would split the problem into two parts:

  1. Missing lower body This is often a framing / prompt-attention / token-entanglement issue.

  2. Deformed lower body This can remain even after full-body examples, because human limb fidelity is still a general T2I weakness.

Those two problems may need different mitigations.


2. First: try to save the existing LoRA

Before retraining, I would test the current LoRA with a small controlled grid.

The important part is to change one thing at a time :

  • same seed
  • same sampler/settings
  • same resolution
  • same prompt
  • only change LoRA weight, aspect ratio, or prompt variant

Otherwise it is hard to tell what helped.


2.1 Use full-body framing as a scene description, not just one token

I would not rely on just:

full body photo of <token>

That may be too weak if the LoRA already learned the subject mostly as a portrait or upper-body concept.

Instead, use multiple mutually reinforcing descriptions:

full-body photograph of <token>, standing naturally, full height visible from head to toe, entire body visible in frame, legs visible, feet visible, shoes visible, both shoes fully visible, both feet planted on the ground, photographed from a distance, camera far enough away to include the complete body, vertical 9:16 portrait, subject centered in frame, visible floor under both shoes

Then repeat the framing near the end:

showing the complete full length of the subject, a full-body photograph capturing the subject in their entirety, not a close-up portrait, not an upper-body crop

This is not just “prompt superstition.” It is trying to reduce concept neglect by giving the model several ways to attend to the same requirement:

Requirement Prompt wording
whole body full-body photograph, full height visible
no crop entire body visible in frame, head to toe
lower body legs visible, feet visible
concrete feet cue shoes visible, both shoes fully visible
ground/contact both feet planted on the ground, visible floor under both shoes
anti-closeup photographed from a distance, camera far enough away
composition vertical 9:16, subject centered in frame

For this specific issue, I think visible floor under both shoes is surprisingly useful because it asks for the feet, the shoes, the ground plane, and the space below the body.


2.2 Feet/shoes/floor can work better than “feet visible” alone

If “feet visible” is ignored, make the feet part of a concrete visual object/scene.

Try variants like:

wearing visible black sneakers, both shoes fully visible, standing on a wooden floor



wearing visible boots, both boots fully visible, standing on concrete ground



empty space below the feet, visible floor under both shoes, full body centered in frame

This is also consistent with practical Flux prompting discussions where people report that “full body” alone can still drift toward upper-body crops, and more concrete cues like shoes/floor/head-to-toe can help. See for example these related community threads:

  • How to get full body shot in Flux?
  • Need Help, Flux is not performing as expected in terms of composition and anatomy

2.3 Aspect ratio helps, but only if paired with distance

Vertical framing helps, but it is not enough by itself.

If the camera is still “close,” the model may simply make a large upper-body portrait inside a vertical frame.

I would test:

Aspect ratio Example resolution Why
3:4 896x1194 / similar natural portrait framing
2:3 832x1248, 896x1344, 1024x1536 good full-body compromise
9:16 768x1365, 832x1472 more room for full body, but face gets smaller

Pair that with:

camera far enough away to include the complete body



empty space above the head and below the feet



visible floor under both shoes

2.4 Sweep LoRA strength instead of using one fixed value

If the LoRA is strong, it may preserve likeness but also preserve the training crop style.

That means a high LoRA weight can accidentally mean:

“make this person” + “make this person appear the way they appeared in the dataset”

So I would test a simple sweep.

LoRA weight What to check
0.45 More composition freedom, weaker likeness
0.55 Possible balance point
0.65 Good first candidate
0.75 Check if close-up crop bias returns
0.85 Stronger likeness, probably stronger dataset bias
1.00 Maximum learned bias; useful as a baseline

Hugging Face Diffusers exposes LoRA scaling / adapter weighting in its LoRA and PEFT integration docs:

  • Diffusers: Load adapters / adjust LoRA weight scale
  • Diffusers: Inference with PEFT / multiple adapters
  • PEFT LoRA docs

Even if your UI is not Diffusers, the same idea usually applies conceptually: test LoRA influence as a variable.


2.5 If using two LoRAs, separate their roles

The 0.50 + 0.50 idea from earlier in the thread makes sense conceptually, but I would test it as a grid rather than a magic number.

A useful two-LoRA split could be:

LoRA Purpose What it should emphasize
Identity LoRA face/person likeness face, expression, angles, identity consistency
Framing/Flexibility LoRA body/framing/context separation full-body, camera distance, clothing/background captions, visible feet/shoes

Then test:

Test Identity LoRA Framing LoRA What to watch
A 0.60 0.30 face fidelity first
B 0.55 0.40 face + body compromise
C 0.50 0.50 balanced, close to the thread suggestion
D 0.45 0.55 stronger framing
E 0.40 0.60 does the face fall apart?

If full body appears but the face changes, the identity LoRA may be too weak or the dataset may not contain enough identity information at full-body distance.

If the face stays but the crop returns, the identity LoRA may be carrying too much close-up framing bias.


3. If retraining the LoRA: caption what you want to control later

This is probably the most important part.

If you want to control something at generation time, it should probably appear explicitly in the training captions.

For example, if the dataset contains many close-ups but the captions only say:

photo of <token>

then the LoRA may learn:

<token> = person identity + close-up framing + upper-body crop + this outfit + this background + no visible feet

A better goal is:

<token> = person identity
close-up = close-up framing
full-body = full-body framing
black jacket = clothing
indoor room = background
camera at a distance = camera distance

RunDiffusion’s dataset guide says to include composition context such as portrait, full-body, or close-up, along with lighting/environment/camera descriptors, in training captions:

  • RunDiffusion: How to prepare a dataset for model training

There are also practical Flux LoRA captioning discussions that point in the same direction:

  • What exactly to caption for Flux LoRA training?
  • kohya-ss Flux LoRA training tips discussion

3.1 Caption examples

For close-up images:

close-up portrait of <token>, face visible, shoulders visible, camera close to the subject, indoor lighting

For upper-body images:

upper-body photo of <token>, torso visible, wearing a black jacket, standing indoors, soft window light

For waist-up images:

waist-up photo of <token>, upper body and waist visible, standing outdoors, camera at medium distance

For three-quarter images:

three-quarter body photo of <token>, legs partially visible, standing outdoors, wearing casual clothes, camera at medium distance

For full-body images:

full-body photo of <token>, head-to-toe visible, legs visible, feet visible, shoes visible, standing at a distance on a concrete floor

For full-body feet/shoes anchor images:

full-body photo of <token>, entire body visible, both shoes visible, feet planted on the ground, visible floor under both shoes, camera far enough away to include head and feet

The important part is not the exact wording. The important part is that close-up images are labelled as close-up images, and full-body images are labelled as full-body images.


3.2 Caption variable things; avoid absorbing them into the token

A practical rule:

Thing in training image Caption it? Why
close-up / portrait crop yes prevents crop type from becoming part of
upper-body / waist-up / full-body yes makes shot distance controllable
clothing usually yes prevents outfit stickiness
background usually yes prevents background stickiness
lighting often yes prevents lighting/style stickiness
camera distance yes helps separate close-up from full-body
feet/shoes/floor yes for full-body samples teaches lower-body framing explicitly
permanent identity usually less that is what should learn

Normal prose note: I am writing the trigger as here. In actual captions/code blocks, use your real trigger token.


4. Build a “distance ladder” in the dataset

I would avoid thinking only in terms of “face images” vs “full-body images.”

The model needs bridge examples.

A close-up teaches identity well, but not body framing. A full-body image teaches framing well, but the face may be too small to teach identity strongly. Bridge shots connect the two.

Shot type What it teaches well What it may fail to teach
Close-up face face identity body framing
Chest / upper body identity + torso legs/feet
Waist-up transition framing feet
Three-quarter body legs/body connection precise feet
Full-body full framing face detail
Full-body with visible shoes/floor lower-body completion face detail unless image quality is high

A related failure mode is visible in this GitHub issue: a Flux LoRA trained on face images can look fine for face/half-body generations, but lose identity when asked for full-body outputs:

  • kohya-ss/sd-scripts issue #1916: FLUX LoRA trained on face images changes face when generating full-body images

That is why I would include not only full-body examples, but also bridge shots.


4.1 Example dataset balance

Not a universal recipe, but a reasonable starting point:

Dataset size Close-up Upper / waist Three-quarter Full-body Clear feet/shoes
20 images 4 5 4 5 2
30 images 5 8 6 8 3
40 images 6 10 8 12 4

For this specific issue, I would rather have fewer but cleaner full-body examples than many low-quality ones.

Full-body examples should ideally have:

  • face still recognizable
  • full head-to-toe framing
  • feet/shoes visible
  • visible ground/floor contact
  • not all the same outfit
  • not all the same background
  • not all the same pose
  • not all the same camera distance

FluxGym and similar trainer guides often recommend a small balanced set of high-quality images rather than indiscriminately adding more data:

  • FluxGym GitHub repo
  • Next Diffusion: How to train a Flux LoRA with FluxGym
  • AI Toolkit by ostris

5. Diagnose the output before changing the dataset

I would keep a small spreadsheet/log like this:

Seed Resolution LoRA weight Prompt variant Face likeness Full body? Feet visible? Leg anatomy Notes
1234 832x1248 0.65 full-body v1 good no no n/a crop bias
1234 832x1248 0.55 full-body v1 ok yes partial bad anatomy issue
1234 896x1344 0.55 shoes/floor v2 ok yes yes better candidate

This helps avoid confusing different problems.

Symptom Likely interpretation What to try
Face good, legs missing crop/framing bias stronger framing prompt, lower LoRA weight, vertical ratio
Full body appears, face changes identity-at-distance failure full-body identity examples, bridge shots
Feet appear, legs warp anatomy weakness cleaner lower-body data, pose/control, inpaint
Outfit always same clothing absorbed into token caption clothing, vary clothing
Background always same background absorbed into token caption background, vary background
Prompt ignored at high LoRA weight LoRA overpowering base prompt lower weight, test checkpoints
Better at lower weight but face weak identity/framing tradeoff two-LoRA split or better balanced retrain

6. Watch for overtraining and prompt-following loss

More training is not automatically better.

Stronger LoRA / more steps can improve likeness, but it can also pull the model back toward the training distribution and reduce prompt flexibility.

So if retraining, I would save intermediate checkpoints and compare them with the same test prompt.

For example:

Checkpoint What to compare
500 steps undertrained? weak likeness?
800 steps does body framing start working?
1000 steps first serious candidate
1500 steps better likeness or more overfit?
2000 steps does prompt-following degrade?

fal.ai’s Flux LoRA training writeup is useful here because it compares training steps and discusses prompt-following / style strength tradeoffs:

  • fal.ai: Training a FLUX style LoRA

Hugging Face also has practical FLUX LoRA / QLoRA resources:

  • Diffusers FLUX DreamBooth LoRA README
  • Hugging Face: FLUX QLoRA on consumer hardware

I would use those as implementation references, not as guarantees that a specific number of steps will solve legs.


7. If the issue is leg deformation, not just missing legs

If the lower body appears but the legs are warped, I would not expect captions alone to fix every case.

That is where pose/control or post-generation repair can be more reliable.

Possible production-oriented fallbacks:

Tool/approach When useful
Pose / ControlNet-style conditioning when you need exact full-body pose/framing
Outpainting downward when upper body/face is good but lower body is missing
Inpainting lower body when legs/feet exist but are wrong
Face pass after full-body generation when full-body works but face likeness drops
Generate full-body first, then refine identity when portrait-biased LoRA fights full-body framing

ControlNet is the classic reference for adding spatial controls such as edges, depth, segmentation, and human pose to text-to-image models:

  • ControlNet paper

For FLUX specifically, there are FLUX ControlNet/Union-style models that support pose-like conditioning, though quality and workflow compatibility depend on your UI:

  • Shakker-Labs / InstantX FLUX.1-dev-ControlNet-Union-Pro

I would treat this as a fallback path, not the first thing to try. It is very useful if you need reliable output, but it can hide whether the LoRA itself is actually fixed.


8. A practical “try this in order” checklist

Phase A — no retraining

  1. Use vertical aspect ratio.
  2. Lower LoRA weight.
  3. Use redundant full-body framing.
  4. Add concrete feet/shoes/floor cues.
  5. Test same seed across a small grid.

Example test prompt:

full-body photograph of <token>, standing naturally, full height visible from head to toe, entire body visible in frame, legs visible, feet visible, shoes visible, both shoes fully visible, both feet planted on the ground, photographed from a distance, camera far enough away to include the complete body, vertical 9:16 portrait, subject centered in frame, visible floor under both shoes, showing the complete full length of the subject, not a close-up portrait, not an upper-body crop

Phase B — LoRA strength / two-LoRA grid

Single LoRA:

0.45, 0.55, 0.65, 0.75, 0.85, 1.00

Two LoRAs:

Identity 0.60 + Framing 0.30
Identity 0.55 + Framing 0.40
Identity 0.50 + Framing 0.50
Identity 0.45 + Framing 0.55
Identity 0.40 + Framing 0.60

Phase C — retrain LoRA, not FLUX itself

  1. Add bridge shots.
  2. Add full-body identity shots.
  3. Add explicit feet/shoes/floor examples.
  4. Caption all controllable visual factors.
  5. Save intermediate checkpoints.
  6. Compare with fixed prompt/seed/resolution.

Phase D — if anatomy still fails

  1. Use pose/control.
  2. Inpaint legs/feet.
  3. Outpaint lower body.
  4. Generate full-body first, then refine face/identity.

9. The main rule I would use

If I had to compress all of this into one rule:

If you want to control it later, caption it during training. If you need it spatially reliable, do not rely only on text. If the LoRA keeps overriding the prompt, reduce its strength or fix the dataset/captions.

For this thread specifically:

  • full-body should be in the training captions for full-body images.
  • close-up portrait should be in the captions for close-up images.
  • feet visible, shoes visible, and standing on floor/ground should be in the captions for examples where that matters.
  • Clothing/background/camera distance should be captioned if you do not want them absorbed into .
  • Full-body examples should not be so small/blurred that the model cannot learn identity at that distance.
  • Close-ups should not dominate so much that the trigger becomes a portrait-crop trigger.

10. Useful links

Theory / why this is hard

  • T2I-CompBench: compositional text-to-image benchmark
  • T2I-CompBench GitHub/project
  • GenEval: object/count/color/position evaluation for T2I
  • Attend-and-Excite: catastrophic neglect / attention guidance
  • HumanRefiner: abnormal human generation and limb quality
  • Distortion-5K / ViT-HD: distorted body parts in generated images
  • DreamBooth: subject-driven generation from a few images
  • ControlNet: adding spatial controls to T2I diffusion models

LoRA / FLUX implementation

  • Diffusers: LoRA / PEFT inference
  • Diffusers: loading adapters and LoRA scale
  • PEFT LoRA docs
  • Diffusers FLUX DreamBooth LoRA README
  • Hugging Face FLUX QLoRA blog
  • FluxGym
  • AI Toolkit by ostris

Captioning / dataset practice

  • RunDiffusion: dataset preparation guide
  • What exactly to caption for Flux LoRA training?
  • Next Diffusion: training a Flux LoRA with FluxGym
  • kohya-ss Flux LoRA training tips discussion
  • Related Flux LoRA full-body/face issue

Related community examples

  • How to get full body shot in Flux?
  • Flux composition/anatomy issue discussion
  • Close-up / face-focused LoRA and full-body generalization discussion

Final thought

I would probably not try to solve this by jumping directly to “train a bigger/better model.”

I would debug it as:

  1. Does the existing LoRA allow full-body framing at lower strength?
  2. Does the prompt make feet/shoes/floor concrete enough?
  3. Does full-body work only when identity is weak?
  4. Are close-up, full-body, clothing, background, and camera distance separated in captions?
  5. Does the dataset contain a distance ladder, or only closeups and full bodies with no bridge?
  6. Are the remaining failures actually human-limb deformation rather than missing lower body?

That separation makes the problem much easier to work on.

Discussion in the ATmosphere

Loading comments...