Low logical reasoning performance of GPT-5.2 at medium and high reasoning effort levels
OpenAI Developer Community
March 6, 2026
I tested GPT 5.4. Looks like whatever problem there was that caused observed steep fall of scores as the benchmark difficulty increased is now fixed. But it’s not all sunshine and rainbows as GPT 5.4 xhigh performs worse than GPT 5.1 high, GPT 5.2 xhigh and even GPT-5.1 medium. Oh well.
| Nr | model_name | lineage | lineage-8 | lineage-64 | lineage-128 | lineage-192 |
|---|---|---|---|---|---|---|
| 1 | openai/gpt-5.1 (high) | 0.969 | 1.000 | 0.975 | 0.975 | 0.925 |
| 2 | openai/gpt-5.2 (xhigh) | 0.962 | 1.000 | 1.000 | 0.925 | 0.925 |
| 3 | openai/gpt-5.1 (medium) | 0.888 | 1.000 | 0.950 | 0.875 | 0.725 |
| 4 | openai/gpt-5.4 (xhigh) | 0.881 | 1.000 | 1.000 | 0.750 | 0.775 |
| 5 | openai/gpt-5.4 (high) | 0.875 | 1.000 | 0.900 | 0.900 | 0.700 |
| 6 | openai/gpt-5.2 (high) | 0.494 | 1.000 | 0.700 | 0.175 | 0.100 |
Discussion in the ATmosphere