Artificial Intelligence

How AI Agents Actually Think — ReAct, Planning and Reflection

Most conversations about AI agents focus on what they can do — which tools they call, which workflows they automate, which tasks they complete without human input. Far fewer focus on how they actually reason their way through a problem. That gap is where most agentic systems fail.

The difference between an agent that handles unexpected situations gracefully and one that loops endlessly, burns through API budget and returns nothing useful is almost always architectural. Not the model. Not the tools. The reasoning structure around them.

Four patterns define how agents reason and act. They are not framework choices or implementation details — they are decisions about how an agent thinks. Understanding them makes you a more informed builder, a sharper evaluator, and the person in the room who can explain why a production agent is misbehaving when everyone else is blaming the model.

🔗 Foundation Posts

AI Agents — What They Are and How They Work — covers what agents are at a high level: what they can do, what makes them different from a simple chatbot, and where they fit in the AI landscape. Read that first if you are new to the concept.
Autonomous AI — The Spectrum from Assistant to Agent — maps the full range from a prompted model to a fully autonomous agent. The patterns in this post are what sit inside that autonomy.

What a design pattern actually is

In the agent world, a design pattern is not a framework, a library, or a configuration. It is a decision about how an agent structures its reasoning and action. Two agents can use identical models, identical tools, and identical prompts — and behave completely differently based on the pattern applied to them.

Think of it like this: the model is the engine. The pattern is the gearbox. The same engine produces very different results depending on how you manage the power.

The four patterns below are the ones that appear in almost every serious agentic system in production as of 2026. They are not mutually exclusive — most real systems combine them. But each solves a distinct problem, and knowing which problem each one solves is the starting point for every design decision.

ReAct — Reason, then Act, then Observe

ReAct (Reasoning + Acting) was introduced in a paper by Yao et al. at Google Research and Princeton in October 2022 — published just before ChatGPT, which makes it the intellectual foundation for most of what followed. The core idea is simple: instead of asking a model to reason in isolation and then act, you interleave the two.

The loop runs as three steps. Thought: the agent articulates what it knows and what it needs to do next. Action: it calls a tool — a search, a database query, a calculation, an API. Observation: the result comes back and becomes part of the context. Then the loop runs again.

ReAct loop diagram on white background showing three stages — Thought in dark navy, Action in teal and Observation in amber — as a clockwise cycle with a concrete support agent example

ReAct is the default starting pattern for good reason. It is interpretable — you can read the Thought steps and understand exactly what the agent was doing when something went wrong. It grounds the agent in real results rather than letting it reason into a hallucination spiral. And a single ReAct agent with well-chosen tools handles the majority of real-world agentic tasks effectively.

Where it breaks: ReAct was not designed for long, multi-step tasks with many dependencies. The agent has no memory of previous episodes — each loop starts fresh from the current context window. It also has no self-correction mechanism. If the agent takes a wrong turn early, it will follow that wrong turn all the way to a wrong answer. It reasons about what to do next. It does not ask whether what it did was any good.

📌 Key Takeaway

ReAct is the right starting point for most agents. Add other patterns only when you hit a specific failure mode that ReAct alone cannot solve.

Planning — Break the problem before solving it

A ReAct agent navigates as it goes. A Planning agent separates thinking from doing into two distinct phases: first, produce a complete plan; then execute it step by step. The planner decomposes a goal into a structured sequence of subtasks. The executor works through each subtask in order, often using ReAct loops for individual steps.

The reason this matters is context. A ReAct agent deciding its next step has access to everything that has happened so far — which, in a long task, means its context fills quickly, earlier instructions get diluted, and the agent starts making decisions based on incomplete information. A planning agent locks in the full structure upfront, before a single action is taken.

AspectReAct alonePlanning + Execution
Task structureNavigates step by step, no upfront planDecomposes the full goal before execution begins
Context useBuilds context as it goes — can drift over long tasksPlan is fixed; executor works within defined subtasks
Failure recoveryAdapts on the fly within the current loopCan re-plan if execution fails a defined checkpoint
Best forSingle-goal tasks, real-time grounding, tool-heavy workComplex multi-step tasks with clear dependencies
RiskWanders without a map on long tasksRigid plan fails when the environment changes unexpectedly

The failure mode to watch for: a plan is a hypothesis about what needs to happen. If the environment changes mid-execution — a tool returns an unexpected result, a dependency is missing, a data source is unavailable — a rigid planner fails silently unless the execution layer is designed to detect and handle deviations. Planning is powerful. An inflexible plan in a dynamic environment is just a slower way to get the wrong answer.

💡 Practical Tip

Use Planning when a task has three or more clearly ordered dependencies, or when the total context required would exceed what ReAct handles cleanly in a single loop. For anything simpler, start with ReAct — planning adds latency and token cost that is only justified by the complexity it manages.

Reflection — The agent that checks its own work

ReAct and Planning both assume the agent’s output is either correct or obviously wrong. Reflection adds a third possibility: the output is plausible, but not good enough. The agent generates a response, then critiques it against a defined standard, then revises. This cycle can run once or multiple times before the final output is returned.

The intellectual foundation here is the Reflexion framework (Shinn et al., NeurIPS 2023). The key insight was that you do not need to retrain a model to improve its performance — you can use language itself as the improvement signal. The critic’s feedback becomes part of the context for the next attempt, acting as a form of verbal reinforcement learning.

Reflection pattern diagram on white background showing three stages — Generate in dark navy, Critique in coral red and Revise in teal — connected by arrows with a loop-back when quality threshold is not met

Where Reflection pays off: high-stakes outputs where accuracy and completeness matter more than speed. Contract analysis, financial summaries, compliance-related outputs, medical document drafting. One iteration of reflection catches a significant portion of the errors that a single-pass ReAct agent would return confidently.

Where it does not: the cost is real. Each reflection cycle adds latency and token spend. For tasks where the output is time-sensitive, or where the quality bar is already met by a well-prompted single pass, Reflection adds overhead without proportional benefit. The worst version is a Reflection loop with no exit condition — the agent circles indefinitely because the critique is never fully satisfied.

⚠️ Warning

Always define an exit condition for Reflection loops — either a maximum number of iterations or a clear quality threshold the critic evaluates against. An open-ended loop is one of the most common causes of runaway token costs in production agents.

Multi-Agent — When one agent is not enough

A single agent — however well-designed — operates with a single context window, a single set of instructions, and a single point of failure. Multi-agent systems distribute work across a team of specialised agents coordinated by an orchestrator. The orchestrator owns the goal; the worker agents own specific tasks.

The justification for this pattern is usually one of three things: specialisation (one agent is better at code generation, another at document retrieval, another at structured data analysis), parallelism (subtasks can run simultaneously rather than sequentially), or scope control (each agent operates with a tightly constrained prompt, reducing the risk of scope drift that affects large, general prompts).

Multi-agent architecture diagram on white background showing an orchestrator agent routing tasks to three worker agents — Research, Analysis and Drafting — with parallel execution and results flowing back to the orchestrator

The failure modes here are specific and worth knowing. Routing errors: the orchestrator sends the wrong task to the wrong agent, and nothing flags it as wrong until the final output lands. Context fragmentation: each agent sees only its slice of the problem; if a later agent needs something an earlier one discovered, that information must be explicitly passed — it does not transfer automatically. Cost explosion: every agent call is a separate LLM invocation, and complex orchestration in production can multiply costs faster than expected.

The most common mistake is reaching for multi-agent too early. A single ReAct agent with well-chosen tools handles most tasks that people assume need a multi-agent system. Multi-agent makes sense when the task genuinely cannot fit in one context window, when parallel execution provides a meaningful latency benefit, or when the specialisation of dedicated agents measurably improves quality.

Best Practice

Start with a single ReAct agent. Only move to multi-agent when you hit a concrete limitation: context window overflow, unacceptable latency on sequential tasks, or measurably better output from specialised agents. Building multi-agent complexity before you need it is one of the most expensive architectural mistakes in agentic AI.

How the patterns combine in practice

Production systems rarely use one pattern in isolation. The patterns compose naturally — each one adds a layer that addresses a specific failure mode the previous layer cannot handle alone.

Two examples show how this works in practice:

System typePattern combination and what each layer does
Enterprise research assistantPlanning decomposes the research goal into subtasks. Multi-Agent runs a retrieval agent and an analysis agent in parallel. ReAct handles each agent’s tool calls. Reflection checks the final synthesised output before delivery.
Document processing pipelineReAct handles document retrieval and extraction. Reflection validates the extracted data against a quality standard and flags gaps. Multi-Agent is not needed — the task is sequential and fits in one context window.

The second example is as important as the first. Not every system needs every pattern. Adding patterns that the task does not require adds cost, latency and debugging complexity. The right architecture is the simplest one that handles the failure modes your specific task actually produces.

📝 Note

The AI Agents post on this site covers tool use and memory as complementary agent capabilities. Tool use is what makes ReAct work — without tools, the agent can only reason, not act. Memory is what allows information to persist across episodes, which Reflection and multi-agent systems often require.

At a glance — AI agent design patterns

Pattern / ConceptOne-line summary
ReActInterleaves Thought, Action and Observation in a loop — the default pattern for most agents
PlanningSeparates goal decomposition from execution — use for complex multi-step tasks with dependencies
ReflectionGenerate, critique and revise — adds a self-correction layer for high-stakes outputs
Multi-AgentOrchestrator routes work to specialised workers — use for parallelism, specialisation or context overflow
Thought (ReAct)The agent articulates its current understanding and decides what to do next
Action (ReAct)The agent calls an external tool — search, API, database, calculation
Observation (ReAct)The tool result is fed back into context to inform the next Thought step
Reflexion (2023)Framework by Shinn et al. that formalised verbal reinforcement for agent self-improvement
OrchestratorThe coordinating agent in a multi-agent system — owns the goal and routes tasks to workers
Pattern compositionProduction systems typically combine 2-3 patterns — each layer addresses a specific failure mode

What to take away

These four patterns are not engineering choices reserved for developers building agents from scratch. They are reasoning choices — and understanding them changes how you evaluate every AI product that claims to be agentic. When a vendor says their agent is autonomous, the right question is not whether it uses an LLM. It is which reasoning pattern it applies, and whether that pattern matches the failure modes of your use case.

A ReAct agent without a reflection layer will deliver wrong answers confidently when it takes a wrong turn. A planning agent with no deviation handling will fail silently when the environment does not behave as the plan assumed. A multi-agent system with poorly designed routing will produce impressive-looking outputs that nobody can trace or audit. These are not edge cases. They are the normal failure modes of each pattern, and they are predictable once you know what to look for.

The teams making agentic AI work in 2026 are not using more sophisticated models than everyone else. They are being precise about which reasoning structure they apply and why. That precision is the skill — and it starts with these four patterns.

🔗 Related Posts

AI Agents — What They Are and How They Work — the foundation post: what agents are, what tools and memory they use, and why they are different from a prompted model.
Autonomous AI — The Spectrum from Assistant to Agent — maps where agents sit on the autonomy spectrum and what each level means in practice.
MCP — Model Context Protocol Explained — how agents connect to external tools and services, which is what makes ReAct and multi-agent patterns work at scale.
AI in the Enterprise — A Practical Map — where these patterns appear in real enterprise AI deployments, including SAP Joule and Microsoft Copilot.

Published on rakeshnarayan.com — Articles

URL: https://rakeshnarayan.com/articles/how-ai-agents-actually-think-react-planning-and-reflection/