Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreig4fgylnpfryspeaz24dqaj3mcxlfhnnnxbxosegrn2njctuahlre",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mh2e63iw3x32"
  },
  "path": "/t/we-found-that-small-llm-is-systematically-more-confident-on-wrong-answers-than-right-ones/174265#post_1",
  "publishedAt": "2026-03-14T18:23:37.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "MerlinSafety/HybridIntelligence-0.5B · Hugging Face"
  ],
  "textContent": "We tested Hybrid Intelligence system with Karpathy`s autoresearchers about (Bio + LLM) intelligence with almost 30,000 experiments and found that small LLM is systematically more confident on wrong answers than right ones.\n\nMetric | Correct | Wrong\n---|---|---\nFirst-token entropy | Higher | Lower\nProbability margin | Lower | Higher\nt-stat | 2.28 | −3.41\n\nThe model is more uncertain when it’s right. More confident when it’s wrong.\n\nThis is the inverse of what calibration should look like.\n\nAlso you can check out our first Hybrid Intelligence model: MerlinSafety/HybridIntelligence-0.5B · Hugging Face",
  "title": "We found that small LLM is systematically more confident on wrong answers than right ones"
}