External Publication
Visit Post

Qwen3.5-4B loss exploding

Hugging Face Forums [Unofficial] March 7, 2026
Source
I am using a combined shuffled dataset that consists of high reasoning claude opus 4.5, 4.6, and gemini 3 pro messages from huggingface itself. Even if i lower the lr it keeps exploding at a further step.

Discussion in the ATmosphere

Loading comments...