Installation​
Install TRL and required dependencies:trl: Core training librarypeft: LoRA/QLoRA supportaccelerate: Multi-GPU and distributed training
Supervised Fine-Tuning (SFT)​
The SFTTrainer makes it easy to fine-tune LFM models on instruction-following or conversational datasets. It handles chat templates, packing, and dataset formatting automatically. SFT training requires Instruction datasets.
LoRA Fine-Tuning (Recommended)​
LoRA (Low-Rank Adaptation) is the recommended approach for fine-tuning LFM2 models with TRL. It offers several key advantages:- Memory efficient: Trains only small adapter weights (~1-2% of model size) instead of full model parameters
- Data efficient: Achieves strong task performance improvements with less training data than full fine-tuning
- Fast training: Reduced parameter count enables faster iteration and larger effective batch sizes
- Flexible: Easy to switch between different task adapters without retraining the base model
SFTTrainer:
Full Fine-Tuning
Full Fine-Tuning
Full fine-tuning updates all model parameters. Use this only when you have sufficient GPU memory and need maximum adaptation for your task.
Vision Language Model Fine-Tuning (VLM-SFT)​
The SFTTrainer also supports fine-tuning Vision Language Models like LFM2.5-VL-1.6B on image-text datasets. VLM fine-tuning requires Vision datasets and a few key differences from text-only SFT:
- Uses
AutoModelForImageTextToTextinstead ofAutoModelForCausalLM - Uses
AutoProcessorinstead of just a tokenizer - Requires dataset formatting with image content types
- Needs a custom
collate_fnfor multimodal batching
VLM LoRA Fine-Tuning (Recommended)​
LoRA is recommended for VLM fine-tuning due to the larger model size and multimodal complexity:Full VLM Fine-Tuning
Full VLM Fine-Tuning
Full VLM fine-tuning updates all model parameters. Use this only when you have sufficient GPU memory.
Direct Preference Optimization (DPO)​
The DPOTrainer implements Direct Preference Optimization, a method to align models with human preferences without requiring a separate reward model. DPO training requires Preference datasets with chosen and rejected response pairs.
DPO with LoRA (Recommended)​
LoRA is highly recommended for DPO training, as it significantly reduces memory requirements while maintaining strong alignment performance.Full DPO Training
Full DPO Training
Full DPO training updates all model parameters. Use this only when you have sufficient GPU memory.
Tips​
- Learning Rates: SFT typically uses higher learning rates (1e-5 to 5e-5) than DPO (1e-7 to 1e-6)
- Batch Size: DPO requires larger effective batch sizes; increase
gradient_accumulation_stepsif GPU memory is limited - LoRA Ranks: Start with
r=16. Higher ranks increase adapter memory and parameter count. Setlora_alpha(a) to2 * r - DPO Beta: The
betaparameter controls the deviation from the reference model. Start with0.1
For more end to end examples, visit the Liquid AI Cookbook. Edit this page