External Publication

Reasoning About the Reasoning of Reasoning Models

Spyglass June 9, 2025

Twas the night before WWDC and all through the house, not a creature was stirring, but a group of researchers studying AI reasoning models were!

Look, it is a bit curious that on the cusp of this year's WWDC – an event where everyone is expecting Apple to underwhelm when it comes to AI announcements – a team of Apple researchers put out a paper suggesting that AI itself is underwhelming. More specifically, that perhaps the hottest sector of AI advances this past year, so-called "reasoning", breaks down at a certain scale. And, more damning, what such systems are doing isn't really reasoning at all. I'm not suggesting their work isn't legitimate or even correct – I'll leave such assessment to others, that's above my pay grade – I'm simply pointing out that it's fairly convenient for Apple to have any element of the AI revolution called into question just as they're about to be dragged through the mud on the topic once again.

How does the saying go? Timing of your research papers is everything...

Anyway, regardless of the 'when', I think the 'what' is a more interesting question here. Or maybe it's the 'why'. Or even the 'who'. As in, what are we even talking about here? And why does it matter? And who is even talking about it?

That is to say, to me, on the surface, this all feels a bit straw man-y. I know we call them "reasoning" systems, but is anyone actually arguing that reasoning – at least as we know it as human beings – is what is actually going on here? That might suggest that AI is operating exactly as a human brain might. Which in turn might suggest that we've achieved AGI or actually, perhaps something far greater.

But as far as I know, outside of a few fringes, no one actually believes that. And so if there's a revelation here, it may simply be that using a simplified marketing term which directly equates a technology to a human capability is a mistake. Since again, they're not actually the same thing. And that's a part of what this paper is taking down.

And that's probably fair on some level because again, it's what is being implied even if only for marketing purposes. And the AI companies seem happy to ride that wave because it does make the whole thing seem more magical and impressive, or, at the very least, more understandable to most people. But again, it's not the same thing. Because an LLM or "LRM" (Large Reasoning Model) is not a human brain.

Steven Sinofsky does a nice job taking this aspect down.

I know that Gary Marcus seems excited – "A knockout blow for LLMs?" is his title which naturally adheres to Betteridge's law. But actually, his post is also far more nuanced than the title suggests. And his higher level takeaway I think is the right one to have: just because LLMs (or LRMs) might not be the path to AGI (itself a quagmire of definitions, as everyone is well aware by now), it doesn't mean this technology isn't interesting or useful. And that's also in line with basically everything everyone says at this point from Demis Hassabis on down: that it's going to take a few more breakthroughs of an unspecified variety before we truly achieve what at least a sizable subset of people would consider to be AGI.

And to me, that implies technology outside of the bounds of LLMs, which were step one. LRMs may be step two, or they may be step 1.5. It doesn't really matter. The point is that they're not the direct path to AGI. But again, as far as I know, they never really were. Just a piece. So taking them down here seems more like a simple semantics argument.

Yes, these systems are not "reasoning" as you or I might. Instead they're doing insanely large scale deduction and pattern-matching.

Far more interesting is the takeaway that may fall victim to near-total collapse at a certain threshold. That's obviously a problem for scaling. And presumably it's one that the model makers were already aware of – I'd be pretty surprised if they weren't? We'll see what at least a few of their responses are at some point soon, I suspect. Perhaps during the WWDC Keynote?

This also feels like one of those problems which will be resolved by something else coming along to augment such a system – something which the paper itself suggests at points. Because again, none of this is a turnkey solution to AGI.

So is this a "knockout blow for LLMs"? No. Is it a knockout blow for AI? No. Is it anything other than a reminder that what we're calling "reasoning" here is not actually reasoning like a human being does? Again, it may just be some nice research showing the exact failure points, right now, of such systems.

Does it mean Apple might actually not be behind in AI? I mean, that's always a possibility just given how early we are in the movement and how fast it's all moving. But my general thought is that in this particular race, they are behind. Not because of the potential outcomes here, but because they're not learning and adapting at the pace that everyone else is and that will make said outcomes less likely. But that's not a very Apple way to go about things, historically. Which is what the real problem is here. An old company needing to learn new tricks – and overcoming the natural urge to believe such tricks are cheap and only worth paying attention to when fully solidified.

If that was indeed their reasoning, it's as flawed as reasoning that reasoning models were reasoning like human beings.

There’s No Free Lunch at AppleThe culture and history have naturally doomed their AI aspirations…SpyglassM.G. SieglerFailure Has Many Fathers at Apple Right NowThe dysfunction seemingly runs deep at the most valuable company…SpyglassM.G. SieglerApple Joins the NavyThey need to find the pirate mentality again…SpyglassM.G. Siegler

Discussion in the ATmosphere