It is not reasoning...
I started this micro-post on Wednesday and it stayed in my drafts.
Now I’ve come across this paper from Apple: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity via Discover AI’s The Collapse of Reasoning (by Apple)
my brain (original draft)
LLMs output is not reasoning in the way we reason, they simply have a better trained Stream of Consciousness.
It is still useful (better trained) but the output from an LLM is not reasoned.
Some AI Systems have “test time” reasoning that’s another thing.
Brave has a nice comparison.

what the paper shows
I like that they tried to train the model with the general solution to the Tower of Hanoi problem. The failure of the model after this training is really interesting, as it strongly implies that the LLMs (or Large Reasoning Models - LRMs) are not a thing…
LLMs, LRMs, and RL trained Tower of Hanoi LRMs all fail at the same point
The difference is how quickly they degrade from success on low complexity problems to complete failure at high complexity problems.
I still don’t like the term pattern matching… 🤷🏼♂️
I’m working through this, ideally proving it right or wrong. I like to think of an LLM as a series of superimposed probability distributions. Each layer, depending on the input, creates an output that can be thought of as subjective belief (in the Bayesian Probability sense) based on that input.
You line these distributions up in a series and we get a coherent output….
it’s not pattern matching, it’s learning by rote
What I think this shows is that it is just memorising things.
The equivalent of a student, that’s not good in maths, but has memorised the times tables up to 12. They come unstuck when you ask for 2 x 13. The equivalent of saying “I never passed 12 and don’t know the answer”.
If it was pattern matching it might have seen that it is increasing by 2 each time and broken the task down to 2 steps:
- Access my memory for 12 X 2
- Add 2 to the answer from 1
So pattern matching implicitly includes some reasoning… Which these are not doing.
how does it solve things outside it’s source data and training?
Awesome question! It needs an answer! 🤓