Large reasoning models can almost certainly think

Recently, there has been a lot of fuss about the idea that large reasoning models (LRMs) are incapable of thinking. This is largely due to a research article published by Apple, "Illusion of thought" Apple argues that LRMS should be unthinkable. Instead, they just perform pattern matching. The evidence they provide is that LRMS with chain-of-thought (COT) reasoning is unable to perform calculations using predefined algorithms as the problem progresses.

This is a fundamentally flawed argument. If you ask a human who already knows the algorithm for solving the Tower of Hanoi problem to solve the Tower of Hanoi problem with base disks, for example, he will surely fail to do so. By this logic, we must conclude that humans cannot even think. However, this argument only points to the idea that there is no evidence that LRMs cannot think. This doesn’t just mean that LRMs can think – just that we can’t be sure that they don’t.

In this article, I will make a bold claim: LRMs can almost certainly think. I say ‘almost’ because there is always a chance that further research will surprise us. But I think my argument is pretty conclusive.

What are you thinking?

Before we try to understand whether LRMs can think, we need to define what we mean by thinking. But first, we need to make sure that humans can think by definition. We will only consider thinking in terms of problem solving, which is a matter of contention.

1. Problem Representation (Frontal and Parietal Lobes)

When you think about a problem, the process engages your prefrontal cortex. This region is responsible for memory, attention, and executive functions—abilities that allow you to keep a problem in mind, break it down into subcomponents, and set goals. Your parietal cortex helps encode symbolic structure for math or puzzle problems.

2. Mental simulation (memory and internal speech)

It has two components: One is an audio loop that lets you talk to yourself. The second is visual imagery, which allows you to connect objects visually. Geometry was so important to navigating the world that we developed special abilities for it. The auditory cortex is connected to Broca’s area and the auditory cortex, both of which are recycled by the language centers. The visual cortex and parietal areas primarily control the visual component.

3. Pattern matching and retrieval (hippocampus and temporal lobes)

These actions depend on past experiences and stored knowledge from long-term memory:

The hippocampus helps to retrieve relevant memories and facts.
The temporal lobe brings semantic knowledge – meaning, rules, categories.

This is how neural networks rely on their training to process this task.

4. Monitoring and evaluation (anterior cingulate cortex)

Our anterior cingulate cortex (ACC) monitors for errors, conflicts or problems – this is where you sense contradictions or dead ends. This process is essentially based on pattern matching from prior experience.

5. Insight or reframing (default mode network and right hemisphere)

When you’re stuck, your mind can change Default mode -A more relaxed, internally directed network. This is when you step back, let go of the current thread and sometimes ‘suddenly’ see a new angle (the classic “Aha!” moment).

how is it Dipsec-R1 It was trained for coat reasoning without cot instances in its training data. Remember, the brain is constantly learning as it processes data and solves problems.

On the contrary, lrms Changes based on real-world feedback are not allowed during forecasting or generation. But the bedside training of Dipsec-R1, with learning what This is how he tried to solve the problems – necessarily updating during the reasoning.

Similarities Between Wim Quot Reasoning and Biological Thinking

All the above mentioned faculties are not mentioned in LRM. For example, an LRM is very unlikely to do a lot of visual reasoning in its circuit, although a little might. But it certainly does not produce intermediate images in COT generation.

Most humans can create spatial models in their heads to solve problems. Does this mean we can conclude that LRMs cannot think? I would disagree. Some humans also find it difficult to form spatial models of the concepts they think about. This condition is called Aphthasia. People with this condition can think clearly. In fact, they go about life as if they have no ability at all. Many of them are actually very good at symbolic reasoning and quite good at math – often enough to compensate for their lack of visual reasoning. We can expect our neural network models to be able to circumvent this limitation as well.

If we take a more abstract view of the human thought process described earlier, we can basically see the following.

1. Pattern matching is used to test learned experience, problem representation and monitoring, and chains of thought.

2. Working memory is the storage of all intermediate steps.

3. A backtracking search concludes that the charpai is going nowhere and is not backtracking to a reasonable point.

Pattern matching in LRM comes from his training. The whole point of training is to learn both the knowledge and patterns of the world so that this knowledge can be effectively applied. Since the LRM is a layered network, the entire working memory needs to fit within a single layer. Weights store world knowledge and subsequent patterns, while processing occurs between layers using the learned patterns stored as model parameters.

Note that even in cot, all text – including the input, COT and part of the pre-generated output – must fit in each layer. Working memory is only one layer (in the case of the attention mechanism, it also includes the KV cache).

In fact, the couch, when we’re talking to ourselves (which is almost always the case) is what we do. We almost always verbalize our thoughts, and so does a bedside manner.

There is also good evidence that a COT reasoner can take backtracking steps when a particular line of reasoning seems futile. In fact, this is what Apple researchers saw when they tried solving large examples of simple puzzles with LRMS. LRMS correctly recognized that trying to solve the puzzles directly wouldn’t fit in their working memory, so they tried to figure out better shortcuts, just like a human would. This is even more proof that LRMs are thinkers, not just blind followers of pre-defined patterns.

But why should predictors of the next token learn to think?

Neural networks of sufficient size can learn any computation, including thinking. But next-word prediction systems can also learn to think. Let me elaborate.

There is a common perception that LRMs cannot think because, at the end of the day, they are only predicting the next token. It’s just a ‘glory auto complete.’ This theory is fundamentally wrong—not that it’s an ‘auto-complete’, but that an ‘auto-complete’ doesn’t require thinking. In fact, next-word prediction is far from a limited representation of thought. On the contrary, it is the most general form of knowledge representation that one can hope for. Let me explain.

Whenever we want to represent some knowledge, we need a language or symbol system to do so. There are various formal languages that are quite precise in terms of their expression. However, such languages are fundamentally limited in the kind of knowledge they can represent.

For example, first-order predicate logic cannot represent the properties of all predicates that satisfy a particular property, because it does not allow for predicates.

Of course, there is a higher-order predictive calculus that can represent arbitrary depth-predictive predictions. But even they cannot express ideas that lack precision or are abstract in nature.

Natural language, however, is complete in expressive power. In fact, you can also describe concepts about Natural language using natural language itself. This makes it a strong candidate for knowledge representation.

Challenge, of course. , is that this expressive richness makes it difficult to process information encoded in natural language. But we don’t need to figure out how to do it manually – we can easily program the machine using the data, through a process called training.

Given the context of the next token, the next token prediction machine essentially computes a probability distribution over the next token. Any machine that aims to accurately calculate this probability must, in some form, represent the global knowledge.

A simple example: Consider the incomplete sentence, "The highest mountain peak in the world is Mt." – In order to predict the next word as Everest, the model must have this knowledge stored somewhere. If the task requires the model to compute an answer or solve a puzzle, the next token predictor needs to output a quote token to carry the logic forward.

This implies that, even though it is predicting one token at a time, the model must internally represent at least the next few tokens enough to ensure that it stays on the logical path.

If you think about it, humans also predict the next token – whether during speech or when using inner voice. A perfect auto-complete system that always produces the right tokens and generates the right answers is a must-have. Of course, we’ll never get to that point – because not every answer is computable.

However, a parameterized model that can represent knowledge by tuning its parameters, and that can learn through data and reinforcement, can certainly learn to think.

Does it produce thought effects?

At the end of the day, the ultimate test of thought is a system’s ability to solve problems that require thought. If a system can answer pre-observed questions that demand some level of reasoning, it must learn to think—or at least reason—how to answer.

We know that proprietary LRMs perform very well on certain reasoning criteria. However, since there’s a possibility that some of these models were fine on the benchmark test set through the backdoor, we’ll just focus on that. Open source model For fairness and transparency.

We evaluate them using the following criteria:

As one can see, in some benchmarks, LRMs are capable of solving a significant number of logic-based questions. While it’s true that they still lag behind human performance in many respects, it’s important to note that the human baseline often comes from people specifically trained on these benchmarks. In fact, in some cases, LRMs outperform the average untrained human.

The result

Based on the benchmark results, the striking similarity between quote reasoning and biological reasoning, and the theoretical understanding that any system with sufficient representational capacity, sufficient training data, and adequate computational power can perform any computable task—LRM meets these criteria to a great extent.

It is therefore reasonable to conclude that LRMs are almost certainly capable of thinking.

Debesh Ray is a Senior Principal Engineer at Chowdhury Talentica Software and a PhD candidate in cryptography at IIT Bombay.

Read more from us Guest authors. Or, consider submitting a post of your own! See our Guidelines here.

Editor's pick

Get latest news

Large reasoning models can almost certainly think

What are you thinking?

Similarities Between Wim Quot Reasoning and Biological Thinking

But why should predictors of the next token learn to think?

Does it produce thought effects?

The result

Cartographer of frequencies. 2025.11.01 | By YU-CHUAN TSENG | November, 2025

Mythical Light in the Age of Ai. In this regard, “Legendary… | by Jotik Atelier | November, 2025

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news