External Publication
Visit Post

We found that small LLM is systematically more confident on wrong answers than right ones

Hugging Face Forums [Unofficial] March 14, 2026
Source

We tested Hybrid Intelligence system with Karpathy`s autoresearchers about (Bio + LLM) intelligence with almost 30,000 experiments and found that small LLM is systematically more confident on wrong answers than right ones.

Metric Correct Wrong
First-token entropy Higher Lower
Probability margin Lower Higher
t-stat 2.28 −3.41

The model is more uncertain when it’s right. More confident when it’s wrong.

This is the inverse of what calibration should look like.

Also you can check out our first Hybrid Intelligence model: MerlinSafety/HybridIntelligence-0.5B · Hugging Face

Discussion in the ATmosphere

Loading comments...