External Publication
Visit Post

VLM Fine tuning: Near-Zero Training Loss but Poor Inference Accuracy on Train Set (Gemma 4 E2B It)

Hugging Face Forums [Unofficial] May 25, 2026
Source

Hi everyone

I am currently fine-tuning the Gemma 4 E2B model for a worker safety project. My goal is to classify whether a worker is using a stepladder safely based on specific safety guidelines (e.g., step position, orientation, and ladder stability).

The Problem: I am facing a strange behavior: My Training Loss converges to near zero (~0.001) very quickly. However, when I run inference on the exact same training images to calculate metrics, the performance is extremely poor (Accuracy ~50%, with a heavy bias towards the “unsafe” class).

Dataset Format: I reformatted my dataset so the Assistant outputs a single JSON string. I also provide the bounding box of the ladder in the User prompt to focus the model’s attention.

{ “messages”: [ { “role”: “system”, “content”: “You are a safety vision model… [Detailed Safety Rules]… Output JSON only.” }, { “role”: “user”, “content”: [ {“type”: “image”, “image”: “<PIL.Image>”}, {“type”: “text”, “text”: “Inspect the stepladder…”} ] }, { “role”: “assistant”, “content”: [{“type”: “text”, “text”: “[{“id”: “0”, “label”: “unsafe”}]”}] } ] }

Framework & Environment:

  • Training Tool: Unsloth Studio (Web UI)

  • Base Model**:** Gemma-4 E2B it

  • PEFT Method**:** LoRA (Fine-tuning both Vision and Language adapters)

Has anyone encountered this “Zero Loss but Zero Performance” issue with Gemma VLM or similar models? Please help me now i am so stuck

Discussion in the ATmosphere

Loading comments...