Context Engineering — Beyond Prompts to Production
You write a careful prompt. You test it. It works. Then you put it into a real system — one with conversation history, retrieved documents, tool outputs, a system prompt someone else wrote — and the quality falls apart. The prompt did not change. The context around it did.
That gap — between a prompt that works in isolation and a system that works in production — is exactly what context engineering addresses. It is not a replacement for prompt engineering. It is the layer above it.
Most AI teams discover this the hard way: prompt quality gets you to a working demo. Context quality gets you to a working product.
🔗 Foundation Posts
This post builds directly on Prompt Engineering — How to Get Reliable Output from Any LLM. If you have not read that post, start there — it covers system prompts, few-shot prompting and chain-of-thought, all of which are components of context engineering.
Also relevant: AI Agents — What They Are and How They Work, since agents are where the need for context engineering becomes impossible to ignore.
What context engineering actually is
Prompt engineering is the practice of crafting the instruction you give the model. Context engineering is the practice of designing everything the model sees — not just the instruction, but all the information surrounding it.
The context window is the model’s working memory. It is the complete input the model processes before producing any output: your system prompt, the conversation history, any documents you have retrieved, any tool results, and the user’s actual message. Context engineering is the discipline of deciding what goes into that window, in what order, and in what format.
Think of it this way. A prompt is a request. Context is the briefing that makes the request solvable. A good analyst given a jumbled stack of conflicting documents and a vague brief will still produce weak output. Give that same analyst a well-organised brief with the right background, clear constraints and relevant data — and the quality of the request matters much less.
📌 Key takeaway
Prompt engineering answers: what should I tell the model to do?
Context engineering answers: what does the model need to see to do it reliably?
Why prompt engineering alone stops working
For a single-turn task — summarise this document, classify this ticket, rewrite this paragraph — prompt engineering is usually enough. You control the input. The model sees only what you give it.
Production systems are not single-turn. They are multi-step, stateful, and often multi-agent. At that point, three failure modes appear that no amount of prompt refinement can fix.
| Failure mode | What happens | Why the prompt is not the fix |
|---|---|---|
| Context drift | The model forgets earlier constraints as the conversation grows. Tone shifts, format breaks, earlier instructions get ignored. | The system prompt is still there — but it is now competing with thousands of tokens of conversation history. Earlier tokens receive less attention. |
| Context pollution | Irrelevant retrieved documents or verbose tool outputs crowd out the information the model actually needs. Quality drops even though the prompt is unchanged. | The model does not choose what to attend to — you do. Too much noise in the window degrades the signal. |
| Lost-in-the-middle | Information buried in the middle of a long context gets processed less reliably than content near the start or end. | This is a known limitation of attention mechanisms. Position in the context window affects how much weight the model gives a piece of information. |
These are not prompt problems. They are context architecture problems. The solution is not a better instruction — it is a more carefully designed information environment.
The five layers of context
A well-engineered context window is not a single block of text — it is a structured set of layers, each with a distinct purpose. Understanding the layers is what makes context engineering a design discipline rather than a guessing game.
| Layer | What it contains | Context engineering decisions |
|---|---|---|
| System prompt | Persona, constraints, output format, rules. The persistent instruction layer. | Write it once, get it right. It competes with everything else in the window — keep it focused and avoid redundancy with other layers. |
| Retrieved documents (RAG) | Relevant documents, data or knowledge fetched from a vector database based on the user’s query. | Choose the right chunk size. Rank by relevance, not recency. Prune aggressively — 3 highly relevant documents beats 10 loosely related ones. |
| Conversation history | Prior turns in the current session. | Do not keep everything. Summarise or truncate older turns. The most recent 3–5 turns are almost always more valuable than a full history. |
| Tool outputs | Results returned by tools the agent called — API responses, code execution results, search results. | Format tool outputs tightly. Raw JSON from an API call is information-dense but attention-costly. Summarise or restructure before injecting. |
| Structured state | Task progress, user preferences, active constraints, memory from previous sessions. | Be explicit about what state is still active. Stale state is worse than no state — it misleads the model about where the task currently stands. |
💡 Practical tip
Order matters as much as content. Place the system prompt first, your retrieved documents near the top, and the user’s actual request last.
This puts the instruction and the most relevant evidence at opposite ends of the window — where attention is strongest.
Bury critical information in the middle at your own risk.
Context engineering in practice — what you actually do
The concepts are clear. The question is what context engineering looks like as an actual practice — what decisions you make, and what habits you build.
Prune before you inject
The instinct is to give the model more. More context, more history, more retrieved documents — just in case it needs them. This instinct is wrong. Every token you inject that does not directly help the current task is a token that competes with the tokens that do.
Before adding anything to the context window, ask: does the model need this to complete the current task? If the answer is not a clear yes, leave it out. Ruthless pruning is the most impactful context engineering habit.
Summarise history, do not append it
Long conversations accumulate fast. By turn 15 of an agentic workflow, the conversation history alone can dwarf the system prompt. The standard approach — keep appending every turn — is the default, not the best practice.
The better approach: rolling summarisation. After every 5–7 turns, compress earlier history into a compact summary and inject that instead. You lose granularity; you gain coherence. For most production use cases, the model does not need to reread turn 3 — it needs to know what was decided in turns 1 through 7.
Set explicit context boundaries for agents
Multi-agent systems introduce a new failure mode: agents that share too much context with each other, or that inherit context that is no longer relevant to their current sub-task.
Each agent in a pipeline should receive only the context it needs for its specific task — not the full accumulated context of the parent agent. This is context segmentation, and it is one of the clearest design decisions in a well-engineered agentic system.
⚠️ Warning
The lost-in-the-middle problem is real and reproducible.
Research published in 2023 and replicated across multiple models — including Claude, GPT-4 and Gemini variants — consistently shows that models fail on simple retrieval tasks when the answer is buried in the middle of a long context.
This is not a model bug. It is a structural property of attention. If your most important information is not near the start or end of the context window, you are relying on luck, not engineering.
Where this shows up in SAP and enterprise AI
Context engineering is not an abstract concept for AI researchers. It is a practical discipline that determines whether enterprise AI systems are useful or frustrating. SAP customers are already hitting the exact failure modes described above.
| Scenario | Context engineering decisions that matter |
|---|---|
| SAP Joule answering process questions | Joule retrieves relevant SAP Help content via RAG. The quality of those retrieved chunks — chunk size, relevance ranking, how much is injected — determines whether Joule gives a precise answer or a vague one. This is a context engineering problem, not a model problem. |
| AI agents on SAP BTP | A BTP agent orchestrating a multi-step procurement workflow accumulates tool outputs from multiple OData calls. If those outputs are injected raw, the context window fills fast. Summarising and structuring tool results before injection is the difference between an agent that completes the task and one that loses track by step 4. |
| RAG pipelines on HANA Cloud | SAP HANA Cloud stores embeddings alongside structured business data. Retrieving the right documents is only half the job. Deciding how many to retrieve, how to rank them, and how to format them for injection into the prompt is the other half — and it is entirely a context engineering decision. |
| Custom AI assistants on BTP with MCP | MCP-connected agents receive tool outputs from external systems in real time. Each tool call adds to the context. Without explicit pruning and formatting decisions, a long MCP session becomes an example of context pollution in action. |
✅ Best practice
For SAP BTP AI scenarios, treat context design as part of your solution architecture — not as a prompt-writing afterthought.
Define what each agent or assistant needs to see, in what format, and what it should never see.
Document this the same way you would document an API contract. Context that is well-defined upfront is far easier to debug and maintain than context that accumulated organically.
At a glance — context engineering essentials
| Concept | One-line summary |
|---|---|
| Context engineering | The discipline of designing what the model sees — not just the instruction, but the full information environment around it |
| Context window | The model’s working memory — everything it processes at once: system prompt, history, retrieved docs, tool outputs and user message combined |
| Context pollution | Too much irrelevant content in the context window — it drowns the signal the model needs and degrades output quality |
| Lost-in-the-middle | A known attention limitation — models process information near the start and end of the context more reliably than content in the middle |
| Context drift | When a model loses track of earlier constraints as conversation history grows and competes with the system prompt for attention |
| Pruning | Deliberately removing low-value context before injection — the single most impactful context engineering habit |
| Rolling summarisation | Compressing older conversation turns into a compact summary rather than keeping the full history — preserves coherence as sessions grow |
| Context segmentation | Giving each agent in a multi-agent pipeline only the context it needs — not the full accumulated context of the orchestrator |
| RAG chunk design | Choosing the right chunk size, ranking strategy and injection volume for retrieved documents — a core context engineering decision |
| Information ordering | Placing critical information at the start or end of the context window — where attention is strongest — not in the middle |
What to take away
Context engineering did not replace prompt engineering. It revealed what prompt engineering was always missing. A well-crafted prompt in a poorly designed context will still fail. A mediocre prompt in a well-engineered context will often succeed. That asymmetry is the most important thing to understand about building reliable AI systems in 2026.
The practical shift is not dramatic. You are not learning a new technology — you are learning to think about the full information environment instead of just the instruction. What goes in the window? In what order? How much of it? What gets pruned? Those are design decisions, and they deserve the same rigour you apply to schema design or API contracts.
If you are building anything more complex than a single-turn assistant — an agent, a RAG pipeline, a multi-step workflow — context engineering is not optional. It is the discipline that determines whether your system works once or works reliably at scale.
🔗 Related Posts
Prompt Engineering — How to Get Reliable Output from Any LLM The foundation this post builds on: system prompts, few-shot examples, chain-of-thought and output specification.
Fine-Tuning vs Prompt Engineering vs RAG — Which to Use Where context engineering fits in the broader landscape of AI customisation approaches.
RAG — Retrieval Augmented Generation Explained RAG is one of the most important context engineering inputs; this post covers how it works end to end.
AI Agents — What They Are and How They Work Agents are where context engineering becomes non-negotiable; this post explains the agent architecture that makes it so.
Temperature and Top-p — Controlling LLM Output Context engineering controls what the model sees. Temperature and top-p control how it generates from what it sees. Together they cover most of what determines output quality.
Published on rakeshnarayan.com — Articles
URL: https://rakeshnarayan.com/articles/context-engineering-beyond-prompts-to-production/



Did you enjoy this article?
Let me know — it takes one click.
0 Comments
Leave a Comment
Your comment has been submitted and will appear after review.