May 29 2025

Thinking Out Loud: Language Models Reasoning Superpowers

sanjaybhatikar Blog, Generative AI Agentic AI, Andrej Karpathy, FEVER dataset, Few-Shot Prompting, Language Models, LLM Agents, Prompt Engineering, REACT, ReAct Pattern, Retrieval-Augmented Generation (RAG), Tool-Using Agents 0

Language models are, at their core, powerful sentence completion engines that, with clever stage management, can give the impression of being helpful assistants. These models have no agency—they cannot initiate actions on their own. But can they be nudged beyond informative conversation toward reasoning and tool use in problem-solving?

Surprisingly, the answer is yes. With the right prompting, a language model can structure its responses in ways that simulate reasoning. In fact, we can tap into the model’s latent “common sense” to extract a coherent reasoning trajectory in response to a problem. By interleaving thought and action, models can tackle tasks that require multi-step reasoning.

This is the essence of the ReAct pattern. In their paper “ReAct: Synergizing Reasoning and Action in Language Models,”Yuan Cao and colleagues show how to unlock this capability.

Helpful Assistant Thoughtful and Handy

Thinking Out Loud: Language Models Reasoning Superpowers

The interaction between humans and language models is typically structured as a “User–Assistant” pattern, which involves three distinct roles:

System: Configures and guides the behavior of the language model.
User: The human who provides input.
Assistant: The model, which responds to the user’s input.

This setup creates a back-and-forth exchange of messages, alternating between the User and Assistant roles. Often, the User’s input is enriched with contextual information retrieved from a vector database. This process, known as Retrieval-Augmented Generation (RAG), supplements the model’s general knowledge with recent or domain-specific data, leading to more accurate and relevant responses.

The ReAct pattern introduces another twist into the plot. Instead of producing a single-shot response, the model alternates between reasoning steps and external actions, forming a loop of Think → Act → Observe:

Think: The model reasons about the current problem and determines what action might be helpful.
Act: It formulates an action (e.g., calling a tool, querying a database) from a predefined set of allowed actions.
Observe: After the action is performed—by a tool or the environment—the model receives the result as input, which it uses to inform the next reasoning step.

During the “Act” phase, the model essentially pauses, awaiting external input. Once the observation is received, it integrates this new information and continues the cycle. This interleaving of thought, action, and observation enables the model to incrementally gather and apply information—ideal for tasks that require multi-step reasoning and tool use.

Roomba Race Reasoning Guides Action

Think about a Roomba. When tasked with cleaning, it proceeds through a series of actions guided by internal reasoning. After each action, it makes an observation—such as detecting an obstacle or identifying a new area—and evaluates the next step accordingly. This continuous sense-think-act loop enables the Roomba to adapt to obstacles, irregular room dimensions, or varying cleanliness conditions.

In a similar way, when a language model is used for multi-step problem-solving, it must determine the appropriate action at any point t based on the context so far. This context consists of the entire sequence of actions and observations up to the most recent one—right before the next action is to be chosen. The challenge is to map this evolving context to an appropriate next action, a process governed by a policy, typically denoted as π (pi). However, this policy is often implicit and computationally intensive.

But here’s the twist: in language models, actions can also take the form of thoughts—textual reasoning steps that don’t directly trigger external tools or cause side effects. A “thought” doesn’t produce an observation but instead updates the internal context with useful inferences or structured reasoning. These thoughts might:

Decompose a high-level goal into smaller tasks,
Inject common-sense knowledge,
Integrate previous observations into planning,
Or summarize and refine a working hypothesis.

This is the essence of thinking out loud—the model simulates reasoning as it progresses through a task.

Thought → Action → Observation: A Reasoning Trajectory

Through repeated cycles of thinking, acting, and observing, a structured reasoning trajectory emerges. Importantly, the model itself decides when to think and when to act. It orchestrates its own problem-solving strategy, dynamically navigating between abstract reasoning and concrete tool use.

This self-directed cycle—core to the ReAct pattern—gives large language models their reasoning superpowers.

Stranger Things Fact or Fiction

FEVER (Fact Extraction and VERification) is a dataset consisting of 185,445 claims, created by modifying sentences extracted from Wikipedia. These claims were then verified by annotators who did not have access to the original sentences. Each claim is labeled as Supported, Refuted, or Not Enough Info. For claims labeled Supported or Refuted, annotators also recorded the specific sentence(s) that served as the evidence for their judgment.

These FEVER claims make excellent test cases for ReAct-style reasoning. Can a language model reason its way to the correct classification using a sequence of thoughts and actions?

Let’s explore this with the claim:

“Stranger Things is set in Bloomington, Indiana.”

We allow the model to use a small, predefined set of three actions (described below).

Thought only:

Action only:

Thought and Action:

Now, consider the model’s behavior under different constraints:

When action is disallowed, the model must rely solely on its common-sense or pre-trained knowledge. It cannot interact with tools or external sources to gather new information, limiting its ability to refine or revise its thinking.
When only action is permitted—without reasoning steps—the model can gather facts, but it may struggle to integrate the new information or plan its use effectively.
When reasoning and action are interleaved, the model can think, decide on a fact-finding step, and then update its internal context based on what it observes. This thought → action → observation loop is where the full power of ReAct emerges: the model continuously adapts its reasoning as it uncovers new evidence.

This interplay—between what the model knows, what it does, and how it integrates observations—forms the foundation of dynamic, tool-augmented reasoning.

Agentic AI Blueprints of an Agent

The ReAct pattern unlocks a pathway for language models to exhibit structured reasoning behavior in service of problem-solving. Through few-shot prompting, models can be guided with curated examples to follow multi-step thought-action-observation sequences. Even more powerfully, the model’s capabilities can be systematically enhanced through fine-tuning on synthetically generated datasets designed to reflect complex reasoning tasks.

This training approach enables models to formulate dynamic plans, make decisions based on a finite set of available tools, and continuously adapt their strategies as new observations emerge from tool use. It transforms static question-answering into a fluid, adaptive process.

As Andrej Karpathy aptly put it:

“The future is Agentic!”

January 7 2024

Unorthodox Onomastics: How AI Can Assist You in Discovering the Perfect Name for Your Child

sanjaybhatikar PyTorch Andrej Karpathy, Artificial Intelligence, Bigram model, Deep Learning, Embeddings, Large Language Models, Onomastics

What is notable about this collection of names?

ann.
akela.
az.
arileri.
chaiadayra

They share a common origin – each one was generated by a Deep Learning model. Intrigued to understand how? Large Language Models (LLMs) are multifaceted, handling complex tasks such as sentence completion, Q&A, text summarization, and sentiment analysis. LLMs, emphasizing their substantial size, are intricate models with tens or hundreds of billions of parameters, honed on vast datasets totaling 10 terabytes. However, it is possible to appreciate the foundation of how machines learn meaning from text starting from a seemingly straightforward concept – the bigram model.

The bigram model operates on the principle of predicting one token from another. For simplicity, let’s consider tokens as characters in the English alphabet. This principle closely aligns with the essence of LLMs like ChatGPT, which predict subsequent tokens based on preceding ones, iteratively generating coherent text and even entire computer programs. In our bigram model, however, we predict one character from the next, utilizing a 26×26 matrix of probabilities. Each entry in the matrix represents the probability of a particular character appearing after another. This matrix, with some modifications, constitutes our model. Our goal? To generate names.

Bigram Matrix

The bigram matrix shows the frequency of occurrence of one token following another (“bigram”) in a given dataset. The tokens here are characters of the English alphabet plus one additional token to mark the start or end of a word. The dataset is a collection of 30,000+ names from a public database. The entry in a cell is the count of occurrences of the character in the column following the character in the row.

We introduce an extra character to mark the start or end of a word, expanding from a 26×26 matrix to a 27×27 matrix. The matrix entries arise from patterns observed in a training dataset comprising over 30,000 names from a public database. Raw occurrence counts shown are transformed into probabilities for sampling. Generating a name involves starting with the character that marks the start of a word, sampling the 1st character from the multinomial probability distribution in the 1st row, recycling that character as input to predict the 2nd character, and so forth until reaching the end character. The resulting names, like junide, janasah, p, cony, and a, showcase the model’s unique outputs.

Considering these names, one might favor Janasah! But there’s room for enhancement. Enter the neural network! How would this transition occur? Instead of relying on a lookup matrix, the neural network would predict one character from another. Here’s how:

Representation: Numerically represent each character for input and output with vectors of length 27, accounting for the extra character.
Data Sets: Divide the data into training, validation, and testing sets to train the model, guard against overfitting, and assess performance.
Loss Function: Utilize negative log-likelihood, common in such scenarios, calculated through a softmax layer to generate a probability distribution.
Training: Adjust model parameters using calculated gradients and backpropagation through the neural network.

Refer to the Colab notebook for the implementation with detailed notes. So we have trained a neural network to do what we could do with a matrix. What’s the big deal?

For one, we can use a longer sequence of characters as input to the neural network, giving the model more material to work with to make better predictions. This block of characters provides not just one sequence, but all sequences including and up to the last character as context to the neural network. This already goes beyond what we can do with matrices with counts of occurrences of bigrams.

But how does a neural network learn meaning in text? Part of the answer lies in embeddings. Every token is converted into a numerical vector of fixed size, thus allowing a spatial representation in which meaningful associations can take shape. We allow the embeddings to emerge as properties of a neural network during the training process. The deeper layers of the neural network use these associations as stepping stones to enrich structure in keeping with the nuances and intricacies of linguistic constructs.

Talk about layered meaning!

Wrapping up our baby steps in language models, we’ve transitioned from basic bigram models to deep neural networks, exploring the evolution from mechanical predictions to embeddings that allow associations that capture primitives of nuanced linguistic structure. We get a glimpse into the potential of these models to grasp the intricacies of language, beyond generating names. As we take these initial steps, the horizon of possibilities widens, promising not only enhanced language generation but also advancements in diverse applications, hinting at a future where machines engage with human communication in increasingly sophisticated ways.

Explore the fascinating world of Artificial Intelligence in my upcoming class, powered by FastAI! We’ll embark on a hands-on journey through the evolving landscape of AI, building models with state-of-the-art architecture and learning to wield the power of Large Language Models (LLMs). Whether you’re a beginner or seasoned enthusiast, this class promises a dynamic and engaging exploration into the realm of AI, equipping you with the skills to navigate and innovate in this rapidly evolving field. Join me for an exciting learning experience that goes beyond theory, fueled by the practical insights and advancements offered by FastAI.