Context Engineering — Beyond Prompts to Production

March 12, 2026 · Updated April 14, 2026 · 8 min read

You write a careful prompt. You test it. It works. Then you put it into a real system — one with conversation history, retrieved documents, tool outputs, a system prompt someone else wrote — and the quality falls apart. The prompt did not change. The context around it did.

That gap — between a prompt that works in isolation and a system that works in production — is exactly what context engineering addresses. It is not a replacement for prompt engineering. It is the layer above it.

Most AI teams discover this the hard way: prompt quality gets you to a working demo. Context quality gets you to a working product.

🔗 Foundation Posts

This post builds directly on Prompt Engineering — How to Get Reliable Output from Any LLM. If you have not read that post, start there — it covers system prompts, few-shot prompting and chain-of-thought, all of which are components of context engineering.
Also relevant: AI Agents — What They Are and How They Work, since agents are where the need for context engineering becomes impossible to ignore.

What context engineering actually is

Prompt engineering is the practice of crafting the instruction you give the model. Context engineering is the practice of designing everything the model sees — not just the instruction, but all the information surrounding it.

The context window is the model’s working memory. It is the complete input the model processes before producing any output: your system prompt, the conversation history, any documents you have retrieved, any tool results, and the user’s actual message. Context engineering is the discipline of deciding what goes into that window, in what order, and in what format.

Think of it this way. A prompt is a request. Context is the briefing that makes the request solvable. A good analyst given a jumbled stack of conflicting documents and a vague brief will still produce weak output. Give that same analyst a well-organised brief with the right background, clear constraints and relevant data — and the quality of the request matters much less.

📌 Key takeaway

Prompt engineering answers: what should I tell the model to do?
Context engineering answers: what does the model need to see to do it reliably?

Why prompt engineering alone stops working

For a single-turn task — summarise this document, classify this ticket, rewrite this paragraph — prompt engineering is usually enough. You control the input. The model sees only what you give it.

Production systems are not single-turn. They are multi-step, stateful, and often multi-agent. At that point, three failure modes appear that no amount of prompt refinement can fix.

Failure mode	What happens	Why the prompt is not the fix
Context drift	The model forgets earlier constraints as the conversation grows. Tone shifts, format breaks, earlier instructions get ignored.	The system prompt is still there — but it is now competing with thousands of tokens of conversation history. Earlier tokens receive less attention.
Context pollution	Irrelevant retrieved documents or verbose tool outputs crowd out the information the model actually needs. Quality drops even though the prompt is unchanged.	The model does not choose what to attend to — you do. Too much noise in the window degrades the signal.
Lost-in-the-middle	Information buried in the middle of a long context gets processed less reliably than content near the start or end.	This is a known limitation of attention mechanisms. Position in the context window affects how much weight the model gives a piece of information.

These are not prompt problems. They are context architecture problems. The solution is not a better instruction — it is a more carefully designed information environment.

The five layers of context

A well-engineered context window is not a single block of text — it is a structured set of layers, each with a distinct purpose. Understanding the layers is what makes context engineering a design discipline rather than a guessing game.

Layer	What it contains	Context engineering decisions
System prompt	Persona, constraints, output format, rules. The persistent instruction layer.	Write it once, get it right. It competes with everything else in the window — keep it focused and avoid redundancy with other layers.
Retrieved documents (RAG)	Relevant documents, data or knowledge fetched from a vector database based on the user’s query.	Choose the right chunk size. Rank by relevance, not recency. Prune aggressively — 3 highly relevant documents beats 10 loosely related ones.
Conversation history	Prior turns in the current session.	Do not keep everything. Summarise or truncate older turns. The most recent 3–5 turns are almost always more valuable than a full history.
Tool outputs	Results returned by tools the agent called — API responses, code execution results, search results.	Format tool outputs tightly. Raw JSON from an API call is information-dense but attention-costly. Summarise or restructure before injecting.
Structured state	Task progress, user preferences, active constraints, memory from previous sessions.	Be explicit about what state is still active. Stale state is worse than no state — it misleads the model about where the task currently stands.

💡 Practical tip

Order matters as much as content. Place the system prompt first, your retrieved documents near the top, and the user’s actual request last.
This puts the instruction and the most relevant evidence at opposite ends of the window — where attention is strongest.
Bury critical information in the middle at your own risk.

Context engineering in practice — what you actually do

The concepts are clear. The question is what context engineering looks like as an actual practice — what decisions you make, and what habits you build.

Prune before you inject

The instinct is to give the model more. More context, more history, more retrieved documents — just in case it needs them. This instinct is wrong. Every token you inject that does not directly help the current task is a token that competes with the tokens that do.

Before adding anything to the context window, ask: does the model need this to complete the current task? If the answer is not a clear yes, leave it out. Ruthless pruning is the most impactful context engineering habit.

Summarise history, do not append it

Long conversations accumulate fast. By turn 15 of an agentic workflow, the conversation history alone can dwarf the system prompt. The standard approach — keep appending every turn — is the default, not the best practice.

The better approach: rolling summarisation. After every 5–7 turns, compress earlier history into a compact summary and inject that instead. You lose granularity; you gain coherence. For most production use cases, the model does not need to reread turn 3 — it needs to know what was decided in turns 1 through 7.

Set explicit context boundaries for agents

Multi-agent systems introduce a new failure mode: agents that share too much context with each other, or that inherit context that is no longer relevant to their current sub-task.

Each agent in a pipeline should receive only the context it needs for its specific task — not the full accumulated context of the parent agent. This is context segmentation, and it is one of the clearest design decisions in a well-engineered agentic system.

⚠️ Warning

The lost-in-the-middle problem is real and reproducible.
Research published in 2023 and replicated across multiple models — including Claude, GPT-4 and Gemini variants — consistently shows that models fail on simple retrieval tasks when the answer is buried in the middle of a long context.
This is not a model bug. It is a structural property of attention. If your most important information is not near the start or end of the context window, you are relying on luck, not engineering.

Where this shows up in SAP and enterprise AI

Context engineering is not an abstract concept for AI researchers. It is a practical discipline that determines whether enterprise AI systems are useful or frustrating. SAP customers are already hitting the exact failure modes described above.

Scenario	Context engineering decisions that matter
SAP Joule answering process questions	Joule retrieves relevant SAP Help content via RAG. The quality of those retrieved chunks — chunk size, relevance ranking, how much is injected — determines whether Joule gives a precise answer or a vague one. This is a context engineering problem, not a model problem.
AI agents on SAP BTP	A BTP agent orchestrating a multi-step procurement workflow accumulates tool outputs from multiple OData calls. If those outputs are injected raw, the context window fills fast. Summarising and structuring tool results before injection is the difference between an agent that completes the task and one that loses track by step 4.
RAG pipelines on HANA Cloud	SAP HANA Cloud stores embeddings alongside structured business data. Retrieving the right documents is only half the job. Deciding how many to retrieve, how to rank them, and how to format them for injection into the prompt is the other half — and it is entirely a context engineering decision.
Custom AI assistants on BTP with MCP	MCP-connected agents receive tool outputs from external systems in real time. Each tool call adds to the context. Without explicit pruning and formatting decisions, a long MCP session becomes an example of context pollution in action.

✅ Best practice

For SAP BTP AI scenarios, treat context design as part of your solution architecture — not as a prompt-writing afterthought.
Define what each agent or assistant needs to see, in what format, and what it should never see.
Document this the same way you would document an API contract. Context that is well-defined upfront is far easier to debug and maintain than context that accumulated organically.

At a glance — context engineering essentials

Concept	One-line summary
Context engineering	The discipline of designing what the model sees — not just the instruction, but the full information environment around it
Context window	The model’s working memory — everything it processes at once: system prompt, history, retrieved docs, tool outputs and user message combined
Context pollution	Too much irrelevant content in the context window — it drowns the signal the model needs and degrades output quality
Lost-in-the-middle	A known attention limitation — models process information near the start and end of the context more reliably than content in the middle
Context drift	When a model loses track of earlier constraints as conversation history grows and competes with the system prompt for attention
Pruning	Deliberately removing low-value context before injection — the single most impactful context engineering habit
Rolling summarisation	Compressing older conversation turns into a compact summary rather than keeping the full history — preserves coherence as sessions grow
Context segmentation	Giving each agent in a multi-agent pipeline only the context it needs — not the full accumulated context of the orchestrator
RAG chunk design	Choosing the right chunk size, ranking strategy and injection volume for retrieved documents — a core context engineering decision
Information ordering	Placing critical information at the start or end of the context window — where attention is strongest — not in the middle

What to take away

Context engineering did not replace prompt engineering. It revealed what prompt engineering was always missing. A well-crafted prompt in a poorly designed context will still fail. A mediocre prompt in a well-engineered context will often succeed. That asymmetry is the most important thing to understand about building reliable AI systems in 2026.

The practical shift is not dramatic. You are not learning a new technology — you are learning to think about the full information environment instead of just the instruction. What goes in the window? In what order? How much of it? What gets pruned? Those are design decisions, and they deserve the same rigour you apply to schema design or API contracts.

If you are building anything more complex than a single-turn assistant — an agent, a RAG pipeline, a multi-step workflow — context engineering is not optional. It is the discipline that determines whether your system works once or works reliably at scale.

🔗 Related Posts

Prompt Engineering — How to Get Reliable Output from Any LLM The foundation this post builds on: system prompts, few-shot examples, chain-of-thought and output specification.
Fine-Tuning vs Prompt Engineering vs RAG — Which to Use Where context engineering fits in the broader landscape of AI customisation approaches.
RAG — Retrieval Augmented Generation Explained RAG is one of the most important context engineering inputs; this post covers how it works end to end.
AI Agents — What They Are and How They Work Agents are where context engineering becomes non-negotiable; this post explains the agent architecture that makes it so.
Temperature and Top-p — Controlling LLM Output Context engineering controls what the model sees. Temperature and top-p control how it generates from what it sees. Together they cover most of what determines output quality.

Published on rakeshnarayan.com — Articles

URL: https://rakeshnarayan.com/articles/context-engineering-beyond-prompts-to-production/

Context EngineeringContext Engineering vs Prompt EngineeringWhat is Context EngineeringAI AgentsLLM Context WindowRAGPrompt EngineeringEnterprise AIContext Window ManagementInformation Architecture AISAP JouleBTP AIAI Production SystemsContext Design 2026

Context Engineering — Beyond Prompts to Production

What context engineering actually is

Why prompt engineering alone stops working

The five layers of context

Context engineering in practice — what you actually do

Prune before you inject

Summarise history, do not append it

Set explicit context boundaries for agents

Where this shows up in SAP and enterprise AI

At a glance — context engineering essentials

What to take away

0 Comments

Leave a Comment

What context engineering actually is

Why prompt engineering alone stops working

The five layers of context

Context engineering in practice — what you actually do

Prune before you inject

Summarise history, do not append it

Set explicit context boundaries for agents

Where this shows up in SAP and enterprise AI

At a glance — context engineering essentials

What to take away

0 Comments

Leave a Comment

Related Articles

Open Source vs Closed Source AI Models — The Real Trade-offs

How LLMs Are Trained — Pretraining, Fine-Tuning and RLHF