External Publication

We found that small LLM is systematically more confident on wrong answers than right ones

Hugging Face Forums [Unofficial] March 14, 2026

We tested Hybrid Intelligence system with Karpathy`s autoresearchers about (Bio + LLM) intelligence with almost 30,000 experiments and found that small LLM is systematically more confident on wrong answers than right ones.

Metric	Correct	Wrong
First-token entropy	Higher	Lower
Probability margin	Lower	Higher
t-stat	2.28	−3.41

The model is more uncertain when it’s right. More confident when it’s wrong.

This is the inverse of what calibration should look like.

Also you can check out our first Hybrid Intelligence model: MerlinSafety/HybridIntelligence-0.5B · Hugging Face

Discussion in the ATmosphere