Qwen3.5-4B loss exploding
Hugging Face Forums [Unofficial]
March 7, 2026
I am using a combined shuffled dataset that consists of high reasoning claude opus 4.5, 4.6, and gemini 3 pro messages from huggingface itself. Even if i lower the lr it keeps exploding at a further step.
Discussion in the ATmosphere