Artificial Intelligence

Fine-Tuning vs Prompt Engineering vs RAG — Which to Use

Every team that gets serious about AI eventually faces the same question: how do we make this model work for our specific context? How do we get it to know our products, use our terminology, follow our policies and answer questions about our domain?

There are three answers — prompt engineering, RAG and fine-tuning. They are not competing options. They solve different problems. And the most common mistake is reaching for the expensive one when the cheap one would have worked.

🔗 Foundation posts

This post builds on What is a Large Language Model? , AI Hallucinations — Why They Happen and RAG — Retrieval Augmented Generation . Read those first if you are new to this area — this post assumes you know what an LLM is and what RAG does.

The problem each one solves

Before choosing an approach, be precise about what problem you actually have. They are different problems with different solutions.

ProblemRight approachWhy
The model does not follow my instructions or tonePrompt EngineeringInstructions live in the system prompt — no training needed
The model does not know my company’s documents, products or current dataRAGRetrieve relevant documents at query time — no training needed
The model’s general behaviour or style needs to change across all responsesFine-TuningBake the behaviour into the model’s weights through training
The model does not know domain-specific terminology wellFine-Tuning or RAGFine-tuning if it needs to reason in the domain. RAG if it needs to look up domain facts.
The model makes up facts about our specific systemsRAGGround it in your actual documentation — hallucination is a retrieval problem, not a training problem

Prompt Engineering — start here, always

Prompt engineering is the practice of designing the instructions, context and examples you give the model to shape its behaviour. It costs nothing to retrain. It works immediately. It is where every AI project should start.

What you can control through prompting

  • Persona and tone — ‘You are a helpful SAP integration consultant. Respond in plain English without jargon.’
  • Output format — ‘Always respond in bullet points. End with a summary in one sentence.’
  • Constraints — ‘Only answer questions about SAP BTP. If the question is outside this scope, say so.’
  • Few-shot examples — showing the model examples of the exact input-output pattern you want
  • Chain-of-thought — asking the model to reason step by step before giving an answer
Prompt elementWhat it doesExample
System promptSets the model’s persona, rules and context for the entire conversation’You are a senior SAP consultant. Be direct and practical. Do not use marketing language.‘
Few-shot examplesShows the model exactly the format and style you wantInclude 2-3 examples of ideal question-answer pairs before the actual query
Context injectionProvides relevant information for this specific queryPaste the relevant document section before asking the question
Output constraintsSpecifies format, length or structure’Respond in exactly three bullet points, each under 20 words.’

💡 Most problems can be solved with prompting alone

Before spending time and money on RAG or fine-tuning, write a better system prompt. At least 60% of AI customisation needs in enterprise projects can be addressed through improved prompting — better instructions, clearer constraints, good examples. It is the fastest iteration cycle and has no compute cost.

Prompt engineering four elements diagram on white background showing System Prompt, Few-Shot Examples, Context Injection and Output Constraints as four quadrants with icons

RAG — when the model needs to know things it cannot know

Prompt engineering works for behaviour and style. It does not work for knowledge that the model was never trained on — your internal documents, your product catalogue, your current policies, anything that changed after the model’s training cutoff.

RAG — Retrieval Augmented Generation — solves this by retrieving the relevant documents at query time and including them in the model’s context. The model reads your documents and answers from them, rather than from its training data.

Use RAG when…Do not use RAG when…
The model needs to answer questions about your specific documentsThe model just needs to follow different instructions
The knowledge changes frequently — policies, product specs, current dataThe knowledge is stable and could be baked into the model
Answers must cite sources for verificationTraceability is not a requirement
You need to keep your data private — RAG does not expose data to model trainingYou need to fundamentally change how the model reasons or communicates

RAG before and after comparison on white background showing ungrounded LLM response without RAG on the left versus source-grounded cited response with RAG on the right

Fine-Tuning — when behaviour needs to change at the model level

Fine-tuning means additional training of the model on your specific data. The model’s weights — the numerical values that encode everything it knows and how it behaves — are updated based on your training examples.

This is expensive, time-consuming and requires careful data preparation. It is also the only option when you need to fundamentally change the model’s behaviour across all interactions — not just this query, not just with the right prompt, but always.

When fine-tuning genuinely makes sense

  • You need the model to consistently produce output in a very specific format that prompting alone cannot reliably enforce
  • You are adapting a smaller model (7B-13B parameters) for a specific narrow task where a large general model is too expensive at scale
  • You need the model to reason with deep domain knowledge — not just look it up, but think with it
  • Your use case involves tens of millions of inferences where a smaller fine-tuned model has significant cost impact

When fine-tuning is the wrong choice

  • Your knowledge changes frequently — fine-tuning bakes a snapshot in; RAG keeps it current
  • You are trying to reduce hallucination — fine-tuning does not solve hallucination the way RAG does
  • You do not have high-quality training data — fine-tuning on bad data produces a model that reliably does the wrong thing
  • You want to try something quickly — fine-tuning is not fast

📌 The cost reality

Fine-tuning a large model (70B+ parameters) requires significant GPU compute — typically thousands of dollars for a training run, plus storage and serving costs. For most enterprise use cases in 2026, prompt engineering plus RAG is more cost-effective and produces better results than fine-tuning a large general model.

Combining approaches — what production systems look like

The best AI systems in 2026 combine all three. Each solves a different layer of the problem.

LayerApproachWhat it handles
Behaviour and personaPrompt EngineeringSystem prompt defines tone, constraints, output format, persona
Current knowledgeRAGRetrieves relevant documents, policies and data at query time
Domain specialisation (optional)Fine-TuningAdjusts a smaller model for the specific task at significant cost

A practical example: an internal HR assistant that answers employee questions about company benefits. The system prompt sets the persona and confidentiality rules (prompt engineering). The benefits documents are indexed in a vector store and retrieved when relevant (RAG). The base model is a frontier LLM accessed via API — no fine-tuning needed because the prompting and retrieval are doing the work.

The decision framework

QuestionIf yes, consider
Does the model need different instructions or style?Prompt engineering first — always start here
Does the model need to answer from your specific documents or data?RAG — add a knowledge base and retrieval layer
Is the problem hallucination of specific facts?RAG — not fine-tuning
Does the model need to consistently produce a very specific format or reasoning style?Fine-tuning a smaller model, after ruling out prompting
Is the knowledge static and does behaviour need to change at model level?Fine-tuning — the only scenario where it clearly wins
Is your use case at enormous scale with specific narrow output?Fine-tuning a smaller model for cost reduction

AI customisation decision flowchart on white background showing three paths from a central question — prompt engineering for tone, RAG for knowledge and fine-tuning for model behaviour

At a glance

ApproachSolvesCostSpeed to implementUpdates
Prompt EngineeringBehaviour, tone, format, instructionsMinimalImmediateInstant — change the prompt
RAGKnowledge, facts, current data, citationsLow to mediumDays to weeksEasy — update the knowledge base
Fine-TuningDeep behaviour change, domain specialisation, small model efficiencyHighWeeks to monthsExpensive — requires retraining

What to take away

Most organisations reach for fine-tuning too quickly. It feels like the most thorough solution — you are actually changing the model, not just guiding it. But in most cases, the simpler approaches do the job better, faster and cheaper.

Start with prompt engineering. If the model needs your specific knowledge, add RAG. Only reach for fine-tuning when you have a specific need that the first two genuinely cannot meet — and when you have the data quality and compute budget to do it properly.

🔗 Related posts on this site

RAG — Retrieval Augmented Generation — the full RAG explanation: how retrieval works, vector databases, indexing and querying. AI Hallucinations — Why They Happen — why RAG reduces hallucination and what the unmitigated rates look like. What is a Large Language Model (LLM)? — the base model that all three customisation approaches work with. Vector Databases Explained — the storage layer that makes RAG possible at scale.

Published on rakeshnarayan.com — Articles

URL: https://rakeshnarayan.com/articles/fine-tuning-vs-prompt-engineering-vs-rag-which-to-use/