The one-sentence definition
Fine-tuning is the process of taking a pre-trained AI model and continuing to train it on a smaller, domain-specific dataset — so the model learns the patterns, vocabulary, format, and knowledge specific to your use case, while retaining everything it learned during its original large-scale training.
The analogy that works: A general-practice doctor knows a little about everything in medicine. A cardiac surgeon knows one thing extremely well. Fine-tuning turns a general AI into the cardiac surgeon AI for your specific problem.
When is fine-tuning the right approach?
Fine-tuning is not always the answer. Before fine-tuning, consider three alternatives that are faster and cheaper:
- Prompting — a better system prompt or few-shot examples often solves the problem without any training. Try this first. Most "I need fine-tuning" problems are actually prompting problems.
- RAG (Retrieval-Augmented Generation) — if the goal is giving the model access to specific knowledge (your documents, your product data, your policies), RAG retrieves relevant content at query time. No training required, and the knowledge stays current automatically.
- Fine-tuning — the right choice when: you need a specific output format or style consistently, the task requires knowledge that cannot be retrieved (e.g. an implicit writing style), you need faster/cheaper inference and can train a smaller model, or you are running thousands of similar requests where a specialised model is more efficient.
What fine-tuning actually changes
A pre-trained model like GPT-4 has billions of numerical parameters — weights — that encode everything it learned from its training data. Fine-tuning updates a subset of these weights using your new training examples. The model learns which patterns in your data are important and adjusts its behaviour accordingly.
What fine-tuning can do:
- Teach the model a specific writing style (your brand voice, a character's voice, a document format)
- Teach it domain-specific vocabulary and concepts (medical terminology, legal language, your company's internal jargon)
- Teach it to follow a specific output format consistently (JSON with particular fields, structured reports, specific templates)
- Teach it to refuse or redirect specific types of requests
- Improve accuracy on a narrow task with a smaller, faster model
What fine-tuning cannot do:
- Add information after the training cutoff reliably — for current information, use RAG
- Make a fundamentally incapable model capable — fine-tuning improves what is already there, does not add new reasoning abilities
- Guarantee factual accuracy — fine-tuned models still hallucinate
How to fine-tune — practical options
OpenAI fine-tuning API
The simplest starting point. Upload a JSONL training file (each line is a prompt-completion pair), trigger the training job, and receive a fine-tuned model ID you can use in API calls. Supports GPT-4o mini and GPT-3.5-turbo. Pricing: training cost (per 1k tokens) + inference cost (per 1k tokens, slightly higher than base models). Minimum recommended training examples: 50-100 good examples, though 500-1,000 produce significantly better results. Documentation at platform.openai.com/docs/guides/fine-tuning.
LoRA / QLoRA on open models
LoRA (Low-Rank Adaptation) is a technique for fine-tuning large models by training only a small set of additional parameters rather than updating all model weights. This reduces the memory and compute required by 10-100x. QLoRA extends LoRA with quantisation for even lower memory requirements. Using LoRA/QLoRA, a consumer GPU (24GB VRAM) can fine-tune models up to 13B parameters. Popular tools: Hugging Face PEFT library, Unsloth (optimised for speed), Axolotl (flexible training framework). Open-source models that can be fine-tuned this way: Llama 3, Mistral, Phi-3, Gemma.
Vertex AI / Azure fine-tuning
Google Cloud Vertex AI and Azure OpenAI both offer managed fine-tuning services for their respective models. Higher cost than doing it yourself, but handles the infrastructure, monitoring, and model serving. Appropriate for enterprise teams that need SLAs and support rather than infrastructure management.
Training data quality — the only thing that matters
Fine-tuning quality is determined almost entirely by training data quality. A small dataset of excellent examples outperforms a large dataset of mediocre examples every time. Principles for training data:
- Consistency — every example should demonstrate exactly the behaviour you want. Contradictory examples confuse the model.
- Coverage — examples should cover the range of inputs the model will receive in production, not just the easy cases.
- Quality over quantity — 100 carefully crafted examples outperform 1,000 automatically generated ones in most cases.
- Correct format — training examples must be in the exact format the model will be queried in production.