External Publication

Low logical reasoning performance of GPT-5.2 at medium and high reasoning effort levels

OpenAI Developer Community March 6, 2026

I tested GPT 5.4. Looks like whatever problem there was that caused observed steep fall of scores as the benchmark difficulty increased is now fixed. But it’s not all sunshine and rainbows as GPT 5.4 xhigh performs worse than GPT 5.1 high, GPT 5.2 xhigh and even GPT-5.1 medium. Oh well.

Nr	model_name	lineage	lineage-8	lineage-64	lineage-128	lineage-192
1	openai/gpt-5.1 (high)	0.969	1.000	0.975	0.975	0.925
2	openai/gpt-5.2 (xhigh)	0.962	1.000	1.000	0.925	0.925
3	openai/gpt-5.1 (medium)	0.888	1.000	0.950	0.875	0.725
4	openai/gpt-5.4 (xhigh)	0.881	1.000	1.000	0.750	0.775
5	openai/gpt-5.4 (high)	0.875	1.000	0.900	0.900	0.700
6	openai/gpt-5.2 (high)	0.494	1.000	0.700	0.175	0.100

Discussion in the ATmosphere