External Publication

Visit Post

Flux LORA won't show my legs or feet

Hugging Face Forums [Unofficial] June 4, 2026

Source

Focusing primarily on ways to address issues raised in the thread, I collected a bunch of things that might be useful:

TL;DR

I would not treat this as one single “FLUX cannot draw legs” bug.

It looks more like several different failure modes overlapping:

Failure mode	What it looks like	What I would try first
Framing failure	legs/feet are missing, cropped, or never enter the frame	stronger full-body framing prompt, vertical aspect ratio, camera distance, visible floor/shoes
Subject-token entanglement	the LoRA keeps pulling the subject back into close-up / upper-body shots	lower LoRA strength, better captions, more varied shot distances
Identity-at-distance failure	full body appears, but the face stops looking like the person	add full-body identity examples and bridge shots, not only face closeups
Caption entanglement	outfit/background/crop type sticks to the person	caption clothes, background, camera distance, shot type
Human anatomy failure	legs appear, but are warped/deformed	more clean lower-body examples, pose/control/inpaint; do not expect prompt alone to solve every case
Overtraining / prompt-following loss	likeness improves but prompt flexibility collapses	test intermediate checkpoints and LoRA weights instead of assuming “more is better”

So my practical mental model would be:

Separate identity from framing , clothing , background , camera distance , and feet/ground contact.

The goal is not necessarily to retrain FLUX itself. The practical layer is usually: prompt framing → LoRA strength → captions → dataset balance → LoRA retraining → pose/control/inpaint if needed.

1. Why this can happen

A subject LoRA may learn more than “who this person is.”

If most examples are close-up or mid-shot, the trigger token can also absorb things like:

close-up framing
upper-body crop
missing lower body
repeated outfit
repeated background
repeated lighting
camera distance
“this person usually appears without visible feet”

This is not unique to FLUX. Text-to-image models can make very plausible images while still being brittle about composition: which requested elements appear, where they appear, and which attributes belong to which object/body part. Benchmarks like T2I-CompBench, T2I-CompBench++, and GenEval are basically built around this kind of compositional weakness.

There is also a known failure mode where text-to-image diffusion models simply do not generate some requested concepts. Attend-and-Excite discusses this as catastrophic neglect , where one or more subjects/concepts in the prompt are not generated. That maps pretty well to “I said feet visible, but the model ignored the feet.”

Separately, human limbs are still a hard case for T2I models. Papers such as HumanRefiner and Distortion-5K / ViT-HD specifically discuss distorted limbs, missing fingers, deformed extremities, fused body parts, and other human-body distortions in generated images.

So I would split the problem into two parts:

Missing lower body This is often a framing / prompt-attention / token-entanglement issue.
Deformed lower body This can remain even after full-body examples, because human limb fidelity is still a general T2I weakness.

Those two problems may need different mitigations.

2. First: try to save the existing LoRA

Before retraining, I would test the current LoRA with a small controlled grid.

The important part is to change one thing at a time :

same seed
same sampler/settings
same resolution
same prompt
only change LoRA weight, aspect ratio, or prompt variant

Otherwise it is hard to tell what helped.

2.1 Use full-body framing as a scene description, not just one token

I would not rely on just:

full body photo of <token>

That may be too weak if the LoRA already learned the subject mostly as a portrait or upper-body concept.

Instead, use multiple mutually reinforcing descriptions:

full-body photograph of <token>, standing naturally, full height visible from head to toe, entire body visible in frame, legs visible, feet visible, shoes visible, both shoes fully visible, both feet planted on the ground, photographed from a distance, camera far enough away to include the complete body, vertical 9:16 portrait, subject centered in frame, visible floor under both shoes

Then repeat the framing near the end:

showing the complete full length of the subject, a full-body photograph capturing the subject in their entirety, not a close-up portrait, not an upper-body crop

This is not just “prompt superstition.” It is trying to reduce concept neglect by giving the model several ways to attend to the same requirement:

Requirement	Prompt wording
whole body	`full-body photograph`, `full height visible`
no crop	`entire body visible in frame`, `head to toe`
lower body	`legs visible`, `feet visible`
concrete feet cue	`shoes visible`, `both shoes fully visible`
ground/contact	`both feet planted on the ground`, `visible floor under both shoes`
anti-closeup	`photographed from a distance`, `camera far enough away`
composition	`vertical 9:16`, `subject centered in frame`

For this specific issue, I think visible floor under both shoes is surprisingly useful because it asks for the feet, the shoes, the ground plane, and the space below the body.

2.2 Feet/shoes/floor can work better than “feet visible” alone

If “feet visible” is ignored, make the feet part of a concrete visual object/scene.

Try variants like:

wearing visible black sneakers, both shoes fully visible, standing on a wooden floor



wearing visible boots, both boots fully visible, standing on concrete ground



empty space below the feet, visible floor under both shoes, full body centered in frame

This is also consistent with practical Flux prompting discussions where people report that “full body” alone can still drift toward upper-body crops, and more concrete cues like shoes/floor/head-to-toe can help. See for example these related community threads:

How to get full body shot in Flux?
Need Help, Flux is not performing as expected in terms of composition and anatomy

2.3 Aspect ratio helps, but only if paired with distance

Vertical framing helps, but it is not enough by itself.

If the camera is still “close,” the model may simply make a large upper-body portrait inside a vertical frame.

I would test:

Aspect ratio	Example resolution	Why
3:4	`896x1194` / similar	natural portrait framing
2:3	`832x1248`, `896x1344`, `1024x1536`	good full-body compromise
9:16	`768x1365`, `832x1472`	more room for full body, but face gets smaller

Pair that with:

camera far enough away to include the complete body



empty space above the head and below the feet



visible floor under both shoes

2.4 Sweep LoRA strength instead of using one fixed value

If the LoRA is strong, it may preserve likeness but also preserve the training crop style.

That means a high LoRA weight can accidentally mean:

“make this person” + “make this person appear the way they appeared in the dataset”

So I would test a simple sweep.

LoRA weight	What to check
`0.45`	More composition freedom, weaker likeness
`0.55`	Possible balance point
`0.65`	Good first candidate
`0.75`	Check if close-up crop bias returns
`0.85`	Stronger likeness, probably stronger dataset bias
`1.00`	Maximum learned bias; useful as a baseline

Hugging Face Diffusers exposes LoRA scaling / adapter weighting in its LoRA and PEFT integration docs:

Diffusers: Load adapters / adjust LoRA weight scale
Diffusers: Inference with PEFT / multiple adapters
PEFT LoRA docs

Even if your UI is not Diffusers, the same idea usually applies conceptually: test LoRA influence as a variable.

2.5 If using two LoRAs, separate their roles

The 0.50 + 0.50 idea from earlier in the thread makes sense conceptually, but I would test it as a grid rather than a magic number.

A useful two-LoRA split could be:

LoRA	Purpose	What it should emphasize
Identity LoRA	face/person likeness	face, expression, angles, identity consistency
Framing/Flexibility LoRA	body/framing/context separation	full-body, camera distance, clothing/background captions, visible feet/shoes

Then test:

Test	Identity LoRA	Framing LoRA	What to watch
A	`0.60`	`0.30`	face fidelity first
B	`0.55`	`0.40`	face + body compromise
C	`0.50`	`0.50`	balanced, close to the thread suggestion
D	`0.45`	`0.55`	stronger framing
E	`0.40`	`0.60`	does the face fall apart?

If full body appears but the face changes, the identity LoRA may be too weak or the dataset may not contain enough identity information at full-body distance.

If the face stays but the crop returns, the identity LoRA may be carrying too much close-up framing bias.

3. If retraining the LoRA: caption what you want to control later

This is probably the most important part.

If you want to control something at generation time, it should probably appear explicitly in the training captions.

For example, if the dataset contains many close-ups but the captions only say:

photo of <token>

then the LoRA may learn:

<token> = person identity + close-up framing + upper-body crop + this outfit + this background + no visible feet

A better goal is:

<token> = person identity
close-up = close-up framing
full-body = full-body framing
black jacket = clothing
indoor room = background
camera at a distance = camera distance

RunDiffusion’s dataset guide says to include composition context such as portrait, full-body, or close-up, along with lighting/environment/camera descriptors, in training captions:

RunDiffusion: How to prepare a dataset for model training

There are also practical Flux LoRA captioning discussions that point in the same direction:

What exactly to caption for Flux LoRA training?
kohya-ss Flux LoRA training tips discussion

3.1 Caption examples

For close-up images:

close-up portrait of <token>, face visible, shoulders visible, camera close to the subject, indoor lighting

For upper-body images:

upper-body photo of <token>, torso visible, wearing a black jacket, standing indoors, soft window light

For waist-up images:

waist-up photo of <token>, upper body and waist visible, standing outdoors, camera at medium distance

For three-quarter images:

three-quarter body photo of <token>, legs partially visible, standing outdoors, wearing casual clothes, camera at medium distance

For full-body images:

full-body photo of <token>, head-to-toe visible, legs visible, feet visible, shoes visible, standing at a distance on a concrete floor

For full-body feet/shoes anchor images:

full-body photo of <token>, entire body visible, both shoes visible, feet planted on the ground, visible floor under both shoes, camera far enough away to include head and feet

The important part is not the exact wording. The important part is that close-up images are labelled as close-up images, and full-body images are labelled as full-body images.

3.2 Caption variable things; avoid absorbing them into the token

A practical rule:

Thing in training image	Caption it?	Why
close-up / portrait crop	yes	prevents crop type from becoming part of
upper-body / waist-up / full-body	yes	makes shot distance controllable
clothing	usually yes	prevents outfit stickiness
background	usually yes	prevents background stickiness
lighting	often yes	prevents lighting/style stickiness
camera distance	yes	helps separate close-up from full-body
feet/shoes/floor	yes for full-body samples	teaches lower-body framing explicitly
permanent identity	usually less	that is what should learn

Normal prose note: I am writing the trigger as here. In actual captions/code blocks, use your real trigger token.

4. Build a “distance ladder” in the dataset

I would avoid thinking only in terms of “face images” vs “full-body images.”

The model needs bridge examples.

A close-up teaches identity well, but not body framing. A full-body image teaches framing well, but the face may be too small to teach identity strongly. Bridge shots connect the two.

Shot type	What it teaches well	What it may fail to teach
Close-up face	face identity	body framing
Chest / upper body	identity + torso	legs/feet
Waist-up	transition framing	feet
Three-quarter body	legs/body connection	precise feet
Full-body	full framing	face detail
Full-body with visible shoes/floor	lower-body completion	face detail unless image quality is high

A related failure mode is visible in this GitHub issue: a Flux LoRA trained on face images can look fine for face/half-body generations, but lose identity when asked for full-body outputs:

kohya-ss/sd-scripts issue #1916: FLUX LoRA trained on face images changes face when generating full-body images

That is why I would include not only full-body examples, but also bridge shots.

4.1 Example dataset balance

Not a universal recipe, but a reasonable starting point:

Dataset size	Close-up	Upper / waist	Three-quarter	Full-body	Clear feet/shoes
20 images	4	5	4	5	2
30 images	5	8	6	8	3
40 images	6	10	8	12	4

For this specific issue, I would rather have fewer but cleaner full-body examples than many low-quality ones.

Full-body examples should ideally have:

face still recognizable
full head-to-toe framing
feet/shoes visible
visible ground/floor contact
not all the same outfit
not all the same background
not all the same pose
not all the same camera distance

FluxGym and similar trainer guides often recommend a small balanced set of high-quality images rather than indiscriminately adding more data:

FluxGym GitHub repo
Next Diffusion: How to train a Flux LoRA with FluxGym
AI Toolkit by ostris

5. Diagnose the output before changing the dataset

I would keep a small spreadsheet/log like this:

Seed	Resolution	LoRA weight	Prompt variant	Face likeness	Full body?	Feet visible?	Leg anatomy	Notes
1234	832x1248	0.65	full-body v1	good	no	no	n/a	crop bias
1234	832x1248	0.55	full-body v1	ok	yes	partial	bad	anatomy issue
1234	896x1344	0.55	shoes/floor v2	ok	yes	yes	better	candidate

This helps avoid confusing different problems.

Symptom	Likely interpretation	What to try
Face good, legs missing	crop/framing bias	stronger framing prompt, lower LoRA weight, vertical ratio
Full body appears, face changes	identity-at-distance failure	full-body identity examples, bridge shots
Feet appear, legs warp	anatomy weakness	cleaner lower-body data, pose/control, inpaint
Outfit always same	clothing absorbed into token	caption clothing, vary clothing
Background always same	background absorbed into token	caption background, vary background
Prompt ignored at high LoRA weight	LoRA overpowering base prompt	lower weight, test checkpoints
Better at lower weight but face weak	identity/framing tradeoff	two-LoRA split or better balanced retrain

6. Watch for overtraining and prompt-following loss

More training is not automatically better.

Stronger LoRA / more steps can improve likeness, but it can also pull the model back toward the training distribution and reduce prompt flexibility.

So if retraining, I would save intermediate checkpoints and compare them with the same test prompt.

For example:

Checkpoint	What to compare
500 steps	undertrained? weak likeness?
800 steps	does body framing start working?
1000 steps	first serious candidate
1500 steps	better likeness or more overfit?
2000 steps	does prompt-following degrade?

fal.ai’s Flux LoRA training writeup is useful here because it compares training steps and discusses prompt-following / style strength tradeoffs:

fal.ai: Training a FLUX style LoRA

Hugging Face also has practical FLUX LoRA / QLoRA resources:

Diffusers FLUX DreamBooth LoRA README
Hugging Face: FLUX QLoRA on consumer hardware

I would use those as implementation references, not as guarantees that a specific number of steps will solve legs.

7. If the issue is leg deformation, not just missing legs

If the lower body appears but the legs are warped, I would not expect captions alone to fix every case.

That is where pose/control or post-generation repair can be more reliable.

Possible production-oriented fallbacks:

Tool/approach	When useful
Pose / ControlNet-style conditioning	when you need exact full-body pose/framing
Outpainting downward	when upper body/face is good but lower body is missing
Inpainting lower body	when legs/feet exist but are wrong
Face pass after full-body generation	when full-body works but face likeness drops
Generate full-body first, then refine identity	when portrait-biased LoRA fights full-body framing

ControlNet is the classic reference for adding spatial controls such as edges, depth, segmentation, and human pose to text-to-image models:

ControlNet paper

For FLUX specifically, there are FLUX ControlNet/Union-style models that support pose-like conditioning, though quality and workflow compatibility depend on your UI:

Shakker-Labs / InstantX FLUX.1-dev-ControlNet-Union-Pro

I would treat this as a fallback path, not the first thing to try. It is very useful if you need reliable output, but it can hide whether the LoRA itself is actually fixed.

8. A practical “try this in order” checklist

Phase A — no retraining

Use vertical aspect ratio.
Lower LoRA weight.
Use redundant full-body framing.
Add concrete feet/shoes/floor cues.
Test same seed across a small grid.

Example test prompt:

full-body photograph of <token>, standing naturally, full height visible from head to toe, entire body visible in frame, legs visible, feet visible, shoes visible, both shoes fully visible, both feet planted on the ground, photographed from a distance, camera far enough away to include the complete body, vertical 9:16 portrait, subject centered in frame, visible floor under both shoes, showing the complete full length of the subject, not a close-up portrait, not an upper-body crop

Phase B — LoRA strength / two-LoRA grid

Single LoRA:

0.45, 0.55, 0.65, 0.75, 0.85, 1.00

Two LoRAs:

Identity 0.60 + Framing 0.30
Identity 0.55 + Framing 0.40
Identity 0.50 + Framing 0.50
Identity 0.45 + Framing 0.55
Identity 0.40 + Framing 0.60

Phase C — retrain LoRA, not FLUX itself

Add bridge shots.
Add full-body identity shots.
Add explicit feet/shoes/floor examples.
Caption all controllable visual factors.
Save intermediate checkpoints.
Compare with fixed prompt/seed/resolution.

Phase D — if anatomy still fails

Use pose/control.
Inpaint legs/feet.
Outpaint lower body.
Generate full-body first, then refine face/identity.

9. The main rule I would use

If I had to compress all of this into one rule:

If you want to control it later, caption it during training. If you need it spatially reliable, do not rely only on text. If the LoRA keeps overriding the prompt, reduce its strength or fix the dataset/captions.

For this thread specifically:

full-body should be in the training captions for full-body images.
close-up portrait should be in the captions for close-up images.
feet visible, shoes visible, and standing on floor/ground should be in the captions for examples where that matters.
Clothing/background/camera distance should be captioned if you do not want them absorbed into .
Full-body examples should not be so small/blurred that the model cannot learn identity at that distance.
Close-ups should not dominate so much that the trigger becomes a portrait-crop trigger.

10. Useful links

Theory / why this is hard

T2I-CompBench: compositional text-to-image benchmark
T2I-CompBench GitHub/project
GenEval: object/count/color/position evaluation for T2I
Attend-and-Excite: catastrophic neglect / attention guidance
HumanRefiner: abnormal human generation and limb quality
Distortion-5K / ViT-HD: distorted body parts in generated images
DreamBooth: subject-driven generation from a few images
ControlNet: adding spatial controls to T2I diffusion models

LoRA / FLUX implementation

Diffusers: LoRA / PEFT inference
Diffusers: loading adapters and LoRA scale
PEFT LoRA docs
Diffusers FLUX DreamBooth LoRA README
Hugging Face FLUX QLoRA blog
FluxGym
AI Toolkit by ostris

Captioning / dataset practice

RunDiffusion: dataset preparation guide
What exactly to caption for Flux LoRA training?
Next Diffusion: training a Flux LoRA with FluxGym
kohya-ss Flux LoRA training tips discussion
Related Flux LoRA full-body/face issue

Related community examples

How to get full body shot in Flux?
Flux composition/anatomy issue discussion
Close-up / face-focused LoRA and full-body generalization discussion

Final thought

I would probably not try to solve this by jumping directly to “train a bigger/better model.”

I would debug it as:

Does the existing LoRA allow full-body framing at lower strength?
Does the prompt make feet/shoes/floor concrete enough?
Does full-body work only when identity is weak?
Are close-up, full-body, clothing, background, and camera distance separated in captions?
Does the dataset contain a distance ladder, or only closeups and full bodies with no bridge?
Are the remaining failures actually human-limb deformation rather than missing lower body?

That separation makes the problem much easier to work on.