Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiaastjytbza4bghj6jgl4pdnwvez546nuyt3j4yfs7iwwzzjheqn4",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mgey4oic74a2"
  },
  "path": "/t/low-logical-reasoning-performance-of-gpt-5-2-at-medium-and-high-reasoning-effort-levels/1372853#post_11",
  "publishedAt": "2026-03-06T08:42:56.000Z",
  "site": "https://community.openai.com",
  "textContent": "I tested GPT 5.4. Looks like whatever problem there was that caused observed steep fall of scores as the benchmark difficulty increased is now fixed. But it’s not all sunshine and rainbows as GPT 5.4 xhigh performs worse than GPT 5.1 high, GPT 5.2 xhigh and even GPT-5.1 medium. Oh well.\n\nNr | model_name | lineage | lineage-8 | lineage-64 | lineage-128 | lineage-192\n---|---|---|---|---|---|---\n1 | openai/gpt-5.1 (high) | 0.969 | 1.000 | 0.975 | 0.975 | 0.925\n2 | openai/gpt-5.2 (xhigh) | 0.962 | 1.000 | 1.000 | 0.925 | 0.925\n3 | openai/gpt-5.1 (medium) | 0.888 | 1.000 | 0.950 | 0.875 | 0.725\n4 | openai/gpt-5.4 (xhigh) | 0.881 | 1.000 | 1.000 | 0.750 | 0.775\n5 | openai/gpt-5.4 (high) | 0.875 | 1.000 | 0.900 | 0.900 | 0.700\n6 | openai/gpt-5.2 (high) | 0.494 | 1.000 | 0.700 | 0.175 | 0.100",
  "title": "Low logical reasoning performance of GPT-5.2 at medium and high reasoning effort levels"
}