Large language models like GPT-4, Claude, and Llama are remarkably capable out of the box. But for specialized use cases — legal document analysis, medical coding, customer support with proprietary products — a general model's knowledge is insufficient.
Fine-tuning adapts a pre-trained model to your specific domain. This guide covers when it's worth doing and how to approach it.
What is fine-tuning?
Fine-tuning takes a pre-trained LLM and continues training on a curated dataset specific to your use case. Unlike prompt engineering, which guides the model through instructions, fine-tuning modifies the model's weights to internalize domain knowledge and behavior patterns.
Fine-tuning can:
- Teach the model domain-specific terminology and concepts
- Enforce output formatting (JSON, specific schemas, brand voice)
- Improve accuracy on specialized tasks
- Reduce hallucination on domain-specific queries
- Compress multi-step instructions into model behavior
Fine-tuning cannot:
- Inject new factual knowledge that wasn't in the training data (use RAG for that)
- Fix fundamental model capability limitations (reasoning, math)
- Eliminate all hallucinations
When to fine-tune
Fine-tuning is the right choice when:
You have a specialized domain with unique terminology. Legal contracts, medical records, financial regulations — these domains use language differently than general web text. Fine-tuning helps the model understand context-specific meanings.
You need consistent output formatting. If your application requires the model to output a specific JSON schema or follow strict formatting rules, fine-tuning embeds that behavior rather than relying on prompt instructions.
You have high-volume, well-defined tasks. Customer support classification, content moderation, intent detection — tasks with clear ground truth and abundant labeled data benefit most from fine-tuning.
When NOT to fine-tune:
- You need the model to answer questions about your specific documents (use RAG)
- You have fewer than a few hundred high-quality examples (prompt engineering is more effective)
- Your use case changes frequently (fine-tuning is expensive to update)
The fine-tuning process
1. Dataset preparation: Curate 500-5000 examples of input-output pairs. Quality matters far more than quantity. A thousand carefully reviewed, correctly labeled examples outperform ten thousand noisy ones.
2. Formatting: Structure data as conversation turns or instruction-response pairs matching your target use case. Most platforms expect a specific JSONL format.
3. Training: Choose your base model (GPT-4o-mini for simpler tasks, Llama 3 for on-premise). Training takes hours to days depending on dataset size and compute resources.
4. Evaluation: Hold out 10-20% of your data for evaluation. Compare fine-tuned model outputs against both baselines (base model with prompt engineering) and ground truth.
5. Deployment: Serve the fine-tuned model through the provider's API or self-host using tools like vLLM or Ollama.
Parameter-efficient fine-tuning
Full fine-tuning modifies all model weights, which is expensive and produces a large artifact. Parameter-efficient methods modify a fraction of parameters:
- LoRA (Low-Rank Adaptation): trains rank-decomposition matrices inserted into the model. The base weights stay frozen. Produces small adapter files (a few MB) that are composable and swappable.
- QLoRA: combines LoRA with 4-bit quantization, enabling fine-tuning on consumer GPUs.
Production considerations
- Model drift: base model updates from the provider may affect fine-tuned versions. Pin model versions.
- Evaluation pipelines: automated evaluation against held-out test sets catches regressions.
- Cost: fine-tuning costs vary widely — OpenAI charges per training token, while open-source models require GPU compute.
- Data privacy: sending data to third-party APIs may violate compliance requirements. Self-hosting an open-source model may be necessary.
Fine-tuning transforms general-purpose LLMs into domain-specific tools that deliver reliable, consistent results for your specific use case.
Exploring AI for your business? At Vynta design and implement fine-tuning pipelines that adapt LLMs to your specific domain and use case.