TRL (RLHF/DPO Training)
What is TRL?
Server Requirements
Component
Minimum
Recommended
VRAM by Task
Task
Model
Method
VRAM
Ports
Port
Service
Notes
Installation on Clore.ai
Step 1 — Rent a Server
Step 2 — Connect via SSH
Step 3 — Install TRL
Step 4 — HuggingFace Authentication
Step 5 — Optional: Weights & Biases Tracking
Supervised Fine-Tuning (SFT)
Prepare Your Dataset
SFT Training Script
DPO (Direct Preference Optimization)
Prepare DPO Dataset
DPO Training Script
PPO (Proximal Policy Optimization)
GRPO (Group Relative Policy Optimization)
Multi-GPU Training
Using the TRL CLI
Monitoring Training
Clore.ai GPU Recommendations
Task
GPU
Notes
Troubleshooting
CUDA Out of Memory
Loss is NaN
DPO: chosen_rewards > rejected_rewards is False
chosen_rewards > rejected_rewards is FalseTraining is very slow
tokenizer.pad_token warning
tokenizer.pad_token warningPermission denied / HuggingFace 401
Saving and Sharing Your Model
Useful Links
Last updated
Was this helpful?