Thinking Out Loud: Language Models Reasoning Superpowers
Language models are, at their core, powerful sentence completion engines that, with clever stage management, can give the impression of being helpful assistants. These models have no agency—they cannot initiate actions on their own. But can they be nudged beyond informative conversation toward reasoning and tool use in problem-solving?
Surprisingly, the answer is yes. With the right prompting, a language model can structure its responses in ways that simulate reasoning. In fact, we can tap into the model’s latent “common sense” to extract a coherent reasoning trajectory in response to a problem. By interleaving thought and action, models can tackle tasks that require multi-step reasoning.
This is the essence of the ReAct pattern. In their paper “ReAct: Synergizing Reasoning and Action in Language Models,”Yuan Cao and colleagues show how to unlock this capability.
Helpful Assistant Thoughtful and Handy

The interaction between humans and language models is typically structured as a “User–Assistant” pattern, which involves three distinct roles:
- System: Configures and guides the behavior of the language model.
- User: The human who provides input.
- Assistant: The model, which responds to the user’s input.
This setup creates a back-and-forth exchange of messages, alternating between the User and Assistant roles. Often, the User’s input is enriched with contextual information retrieved from a vector database. This process, known as Retrieval-Augmented Generation (RAG), supplements the model’s general knowledge with recent or domain-specific data, leading to more accurate and relevant responses.
The ReAct pattern introduces another twist into the plot. Instead of producing a single-shot response, the model alternates between reasoning steps and external actions, forming a loop of Think → Act → Observe:
- Think: The model reasons about the current problem and determines what action might be helpful.
- Act: It formulates an action (e.g., calling a tool, querying a database) from a predefined set of allowed actions.
- Observe: After the action is performed—by a tool or the environment—the model receives the result as input, which it uses to inform the next reasoning step.
During the “Act” phase, the model essentially pauses, awaiting external input. Once the observation is received, it integrates this new information and continues the cycle. This interleaving of thought, action, and observation enables the model to incrementally gather and apply information—ideal for tasks that require multi-step reasoning and tool use.
Roomba Race Reasoning Guides Action
Think about a Roomba. When tasked with cleaning, it proceeds through a series of actions guided by internal reasoning. After each action, it makes an observation—such as detecting an obstacle or identifying a new area—and evaluates the next step accordingly. This continuous sense-think-act loop enables the Roomba to adapt to obstacles, irregular room dimensions, or varying cleanliness conditions.
In a similar way, when a language model is used for multi-step problem-solving, it must determine the appropriate action at any point t based on the context so far. This context consists of the entire sequence of actions and observations up to the most recent one—right before the next action is to be chosen. The challenge is to map this evolving context to an appropriate next action, a process governed by a policy, typically denoted as π (pi). However, this policy is often implicit and computationally intensive.
But here’s the twist: in language models, actions can also take the form of thoughts—textual reasoning steps that don’t directly trigger external tools or cause side effects. A “thought” doesn’t produce an observation but instead updates the internal context with useful inferences or structured reasoning. These thoughts might:
- Decompose a high-level goal into smaller tasks,
- Inject common-sense knowledge,
- Integrate previous observations into planning,
- Or summarize and refine a working hypothesis.
This is the essence of thinking out loud—the model simulates reasoning as it progresses through a task.
Thought → Action → Observation: A Reasoning Trajectory
Through repeated cycles of thinking, acting, and observing, a structured reasoning trajectory emerges. Importantly, the model itself decides when to think and when to act. It orchestrates its own problem-solving strategy, dynamically navigating between abstract reasoning and concrete tool use.
This self-directed cycle—core to the ReAct pattern—gives large language models their reasoning superpowers.
Stranger Things Fact or Fiction
FEVER (Fact Extraction and VERification) is a dataset consisting of 185,445 claims, created by modifying sentences extracted from Wikipedia. These claims were then verified by annotators who did not have access to the original sentences. Each claim is labeled as Supported, Refuted, or Not Enough Info. For claims labeled Supported or Refuted, annotators also recorded the specific sentence(s) that served as the evidence for their judgment.
These FEVER claims make excellent test cases for ReAct-style reasoning. Can a language model reason its way to the correct classification using a sequence of thoughts and actions?
Let’s explore this with the claim:
“Stranger Things is set in Bloomington, Indiana.”
We allow the model to use a small, predefined set of three actions (described below).
Thought only:
Action only:
Thought and Action:
Now, consider the model’s behavior under different constraints:
- When action is disallowed, the model must rely solely on its common-sense or pre-trained knowledge. It cannot interact with tools or external sources to gather new information, limiting its ability to refine or revise its thinking.
- When only action is permitted—without reasoning steps—the model can gather facts, but it may struggle to integrate the new information or plan its use effectively.
- When reasoning and action are interleaved, the model can think, decide on a fact-finding step, and then update its internal context based on what it observes. This thought → action → observation loop is where the full power of ReAct emerges: the model continuously adapts its reasoning as it uncovers new evidence.
This interplay—between what the model knows, what it does, and how it integrates observations—forms the foundation of dynamic, tool-augmented reasoning.
Agentic AI Blueprints of an Agent
The ReAct pattern unlocks a pathway for language models to exhibit structured reasoning behavior in service of problem-solving. Through few-shot prompting, models can be guided with curated examples to follow multi-step thought-action-observation sequences. Even more powerfully, the model’s capabilities can be systematically enhanced through fine-tuning on synthetically generated datasets designed to reflect complex reasoning tasks.
This training approach enables models to formulate dynamic plans, make decisions based on a finite set of available tools, and continuously adapt their strategies as new observations emerge from tool use. It transforms static question-answering into a fluid, adaptive process.
As Andrej Karpathy aptly put it:
“The future is Agentic!”