Fine-Tuning vs Prompt Engineering vs RAG — Which to Use
Every team that gets serious about AI eventually faces the same question: how do we make this model work for our specific context? How do we get it to know our products, use our terminology, follow our policies and answer questions about our domain?
There are three answers — prompt engineering, RAG and fine-tuning. They are not competing options. They solve different problems. And the most common mistake is reaching for the expensive one when the cheap one would have worked.
🔗 Foundation posts
This post builds on What is a Large Language Model? , AI Hallucinations — Why They Happen and RAG — Retrieval Augmented Generation . Read those first if you are new to this area — this post assumes you know what an LLM is and what RAG does.
The problem each one solves
Before choosing an approach, be precise about what problem you actually have. They are different problems with different solutions.
| Problem | Right approach | Why |
|---|---|---|
| The model does not follow my instructions or tone | Prompt Engineering | Instructions live in the system prompt — no training needed |
| The model does not know my company’s documents, products or current data | RAG | Retrieve relevant documents at query time — no training needed |
| The model’s general behaviour or style needs to change across all responses | Fine-Tuning | Bake the behaviour into the model’s weights through training |
| The model does not know domain-specific terminology well | Fine-Tuning or RAG | Fine-tuning if it needs to reason in the domain. RAG if it needs to look up domain facts. |
| The model makes up facts about our specific systems | RAG | Ground it in your actual documentation — hallucination is a retrieval problem, not a training problem |
Prompt Engineering — start here, always
Prompt engineering is the practice of designing the instructions, context and examples you give the model to shape its behaviour. It costs nothing to retrain. It works immediately. It is where every AI project should start.
What you can control through prompting
- Persona and tone — ‘You are a helpful SAP integration consultant. Respond in plain English without jargon.’
- Output format — ‘Always respond in bullet points. End with a summary in one sentence.’
- Constraints — ‘Only answer questions about SAP BTP. If the question is outside this scope, say so.’
- Few-shot examples — showing the model examples of the exact input-output pattern you want
- Chain-of-thought — asking the model to reason step by step before giving an answer
| Prompt element | What it does | Example |
|---|---|---|
| System prompt | Sets the model’s persona, rules and context for the entire conversation | ’You are a senior SAP consultant. Be direct and practical. Do not use marketing language.‘ |
| Few-shot examples | Shows the model exactly the format and style you want | Include 2-3 examples of ideal question-answer pairs before the actual query |
| Context injection | Provides relevant information for this specific query | Paste the relevant document section before asking the question |
| Output constraints | Specifies format, length or structure | ’Respond in exactly three bullet points, each under 20 words.’ |
💡 Most problems can be solved with prompting alone
Before spending time and money on RAG or fine-tuning, write a better system prompt. At least 60% of AI customisation needs in enterprise projects can be addressed through improved prompting — better instructions, clearer constraints, good examples. It is the fastest iteration cycle and has no compute cost.
RAG — when the model needs to know things it cannot know
Prompt engineering works for behaviour and style. It does not work for knowledge that the model was never trained on — your internal documents, your product catalogue, your current policies, anything that changed after the model’s training cutoff.
RAG — Retrieval Augmented Generation — solves this by retrieving the relevant documents at query time and including them in the model’s context. The model reads your documents and answers from them, rather than from its training data.
| Use RAG when… | Do not use RAG when… |
|---|---|
| The model needs to answer questions about your specific documents | The model just needs to follow different instructions |
| The knowledge changes frequently — policies, product specs, current data | The knowledge is stable and could be baked into the model |
| Answers must cite sources for verification | Traceability is not a requirement |
| You need to keep your data private — RAG does not expose data to model training | You need to fundamentally change how the model reasons or communicates |
Fine-Tuning — when behaviour needs to change at the model level
Fine-tuning means additional training of the model on your specific data. The model’s weights — the numerical values that encode everything it knows and how it behaves — are updated based on your training examples.
This is expensive, time-consuming and requires careful data preparation. It is also the only option when you need to fundamentally change the model’s behaviour across all interactions — not just this query, not just with the right prompt, but always.
When fine-tuning genuinely makes sense
- You need the model to consistently produce output in a very specific format that prompting alone cannot reliably enforce
- You are adapting a smaller model (7B-13B parameters) for a specific narrow task where a large general model is too expensive at scale
- You need the model to reason with deep domain knowledge — not just look it up, but think with it
- Your use case involves tens of millions of inferences where a smaller fine-tuned model has significant cost impact
When fine-tuning is the wrong choice
- Your knowledge changes frequently — fine-tuning bakes a snapshot in; RAG keeps it current
- You are trying to reduce hallucination — fine-tuning does not solve hallucination the way RAG does
- You do not have high-quality training data — fine-tuning on bad data produces a model that reliably does the wrong thing
- You want to try something quickly — fine-tuning is not fast
📌 The cost reality
Fine-tuning a large model (70B+ parameters) requires significant GPU compute — typically thousands of dollars for a training run, plus storage and serving costs. For most enterprise use cases in 2026, prompt engineering plus RAG is more cost-effective and produces better results than fine-tuning a large general model.
Combining approaches — what production systems look like
The best AI systems in 2026 combine all three. Each solves a different layer of the problem.
| Layer | Approach | What it handles |
|---|---|---|
| Behaviour and persona | Prompt Engineering | System prompt defines tone, constraints, output format, persona |
| Current knowledge | RAG | Retrieves relevant documents, policies and data at query time |
| Domain specialisation (optional) | Fine-Tuning | Adjusts a smaller model for the specific task at significant cost |
A practical example: an internal HR assistant that answers employee questions about company benefits. The system prompt sets the persona and confidentiality rules (prompt engineering). The benefits documents are indexed in a vector store and retrieved when relevant (RAG). The base model is a frontier LLM accessed via API — no fine-tuning needed because the prompting and retrieval are doing the work.
The decision framework
| Question | If yes, consider |
|---|---|
| Does the model need different instructions or style? | Prompt engineering first — always start here |
| Does the model need to answer from your specific documents or data? | RAG — add a knowledge base and retrieval layer |
| Is the problem hallucination of specific facts? | RAG — not fine-tuning |
| Does the model need to consistently produce a very specific format or reasoning style? | Fine-tuning a smaller model, after ruling out prompting |
| Is the knowledge static and does behaviour need to change at model level? | Fine-tuning — the only scenario where it clearly wins |
| Is your use case at enormous scale with specific narrow output? | Fine-tuning a smaller model for cost reduction |
At a glance
| Approach | Solves | Cost | Speed to implement | Updates |
|---|---|---|---|---|
| Prompt Engineering | Behaviour, tone, format, instructions | Minimal | Immediate | Instant — change the prompt |
| RAG | Knowledge, facts, current data, citations | Low to medium | Days to weeks | Easy — update the knowledge base |
| Fine-Tuning | Deep behaviour change, domain specialisation, small model efficiency | High | Weeks to months | Expensive — requires retraining |
What to take away
Most organisations reach for fine-tuning too quickly. It feels like the most thorough solution — you are actually changing the model, not just guiding it. But in most cases, the simpler approaches do the job better, faster and cheaper.
Start with prompt engineering. If the model needs your specific knowledge, add RAG. Only reach for fine-tuning when you have a specific need that the first two genuinely cannot meet — and when you have the data quality and compute budget to do it properly.
🔗 Related posts on this site
RAG — Retrieval Augmented Generation — the full RAG explanation: how retrieval works, vector databases, indexing and querying. AI Hallucinations — Why They Happen — why RAG reduces hallucination and what the unmitigated rates look like. What is a Large Language Model (LLM)? — the base model that all three customisation approaches work with. Vector Databases Explained — the storage layer that makes RAG possible at scale.
Published on rakeshnarayan.com — Articles
URL: https://rakeshnarayan.com/articles/fine-tuning-vs-prompt-engineering-vs-rag-which-to-use/


