# LLaMA-Factory LLaMA-Factory is the most comprehensive open-source fine-tuning framework, supporting 100+ language models including all LLaMA variants, Qwen, Mistral, Phi, Falcon, ChatGLM, and more. It offers LoRA, QLoRA, full fine-tuning, RLHF, DPO, and PPO — all through an intuitive web interface (LLaMA Board) or CLI. CLORE.AI's on-demand GPU servers make it the perfect platform for launching fine-tuning jobs at fraction of the cost of cloud providers. {% hint style="success" %} All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace). {% endhint %} ## Server Requirements | Parameter | Minimum | Recommended | | --------- | ---------------- | -------------- | | RAM | 16 GB | 32 GB+ | | VRAM | 8 GB (QLoRA) | 24 GB+ | | Disk | 50 GB | 200 GB+ | | GPU | NVIDIA RTX 2080+ | A100, RTX 4090 | {% hint style="info" %} **Training method determines GPU requirements:** * **QLoRA (4-bit)**: 8 GB VRAM for 7B models, 16 GB for 13B * **LoRA (float16)**: 16 GB VRAM for 7B models, 40 GB for 13B * **Full fine-tuning**: \~14 GB VRAM per 7B parameter (+ optimizer states) * Multi-GPU (DeepSpeed/FSDP) scales across any number of GPUs {% endhint %} ## Quick Deploy on CLORE.AI **Docker Image:** `hiyouga/llamafactory:latest` **Ports:** `22/tcp`, `7860/http` **Environment Variables:** | Variable | Example | Description | | ---------------------- | ----------- | ---------------------------------------- | | `HF_TOKEN` | `hf_xxx...` | HuggingFace token for gated models | | `WANDB_API_KEY` | `xxx...` | Weights & Biases for experiment tracking | | `CUDA_VISIBLE_DEVICES` | `0,1` | GPUs to use | ## Step-by-Step Setup ### 1. Rent a GPU Server on CLORE.AI Visit [CLORE.AI Marketplace](https://clore.ai/marketplace) and select based on your task: | Task | VRAM | Recommended GPU | | ---------- | ------ | --------------- | | QLoRA 7B | 8 GB | RTX 3070/2080 | | QLoRA 13B | 16 GB | RTX 3090/A4000 | | LoRA 7B | 16 GB | RTX 3090/A4000 | | LoRA 13B | 40 GB | A6000/A100 40GB | | Full FT 7B | 80 GB | A100 80GB | | Multi-GPU | Varies | 2-8× any GPU | ### 2. SSH into Your Server ```bash ssh -p root@ ``` ### 3. Create Working Directories ```bash mkdir -p /root/llamafactory/{data,models,output,saves} ``` ### 4. Pull the Docker Image ```bash docker pull hiyouga/llamafactory:latest ``` ### 5. Launch LLaMA-Factory **Launch with Web UI (LLaMA Board):** ```bash docker run -d \ --name llamafactory \ --gpus all \ -p 7860:7860 \ -v /root/llamafactory/data:/app/LLaMA-Factory/data \ -v /root/llamafactory/models:/root/.cache/huggingface \ -v /root/llamafactory/output:/app/LLaMA-Factory/output \ -v /root/llamafactory/saves:/app/LLaMA-Factory/saves \ -e HF_TOKEN=hf_your_token_here \ hiyouga/llamafactory:latest \ llamafactory-cli webui ``` **With Weights & Biases tracking:** ```bash docker run -d \ --name llamafactory \ --gpus all \ -p 7860:7860 \ -v /root/llamafactory/data:/app/LLaMA-Factory/data \ -v /root/llamafactory/models:/root/.cache/huggingface \ -v /root/llamafactory/output:/app/LLaMA-Factory/output \ -v /root/llamafactory/saves:/app/LLaMA-Factory/saves \ -e HF_TOKEN=hf_your_token_here \ -e WANDB_API_KEY=your_wandb_key \ hiyouga/llamafactory:latest \ llamafactory-cli webui ``` **Multi-GPU with DeepSpeed (4 GPUs):** ```bash docker run -d \ --name llamafactory \ --gpus all \ --shm-size 16g \ --ipc host \ -p 7860:7860 \ -v /root/llamafactory/data:/app/LLaMA-Factory/data \ -v /root/llamafactory/models:/root/.cache/huggingface \ -v /root/llamafactory/output:/app/LLaMA-Factory/output \ -e CUDA_VISIBLE_DEVICES=0,1,2,3 \ hiyouga/llamafactory:latest \ bash -c "llamafactory-cli webui" ``` ### 6. Access the Web Interface Check logs and get the URL: ```bash docker logs -f llamafactory ``` Your CLORE.AI http\_pub URL for port 7860: ``` https://-7860.clore.ai/ ``` *** ## Usage Examples ### Example 1: LoRA Fine-Tuning via Web UI (LLaMA Board) 1. Open LLaMA Board at your CLORE.AI URL 2. Go to the **Train** tab 3. Configure: * **Model Name**: `LLaMA-3` → `Meta-Llama-3-8B-Instruct` * **Training Stage**: `Supervised Fine-Tuning` * **Dataset**: Select your dataset (or upload custom) * **Fine-tuning method**: `lora` * **LoRA rank**: `8` (higher = more parameters trained) * **Learning rate**: `1e-4` * **Epochs**: `3` * **Output dir**: `llama3-finetuned` 4. Click **Start** to begin training 5. Monitor loss curves in the **Loss** chart ### Example 2: CLI-Based QLoRA Fine-Tuning Prepare a training config YAML: ```bash docker exec -it llamafactory bash cat > /app/LLaMA-Factory/configs/qlora_mistral.yaml << 'EOF' ### Model model_name_or_path: mistralai/Mistral-7B-Instruct-v0.3 ### Method stage: sft do_train: true finetuning_type: lora lora_target: all lora_rank: 16 lora_alpha: 32 lora_dropout: 0.05 ### Dataset dataset: alpaca_en template: mistral cutoff_len: 2048 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16 ### Output output_dir: saves/mistral-qlora logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true ### Train per_device_train_batch_size: 2 gradient_accumulation_steps: 8 learning_rate: 0.0001 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true ### Quantization quantization_method: bitsandbytes quantization_bit: 4 EOF # Run training llamafactory-cli train /app/LLaMA-Factory/configs/qlora_mistral.yaml ``` ### Example 3: Upload Custom Dataset Create a custom dataset in Alpaca format: ```bash # Create dataset file cat > /root/llamafactory/data/my_dataset.json << 'EOF' [ { "instruction": "You are a customer service agent for a tech company. Answer helpfully.", "input": "My laptop won't turn on after the update. What should I do?", "output": "I understand how frustrating that can be. Let's troubleshoot step by step:\n\n1. **Hold power button for 10 seconds** to force shutdown\n2. **Remove power cable** and battery if removable, wait 30 seconds\n3. **Reconnect power** and try turning on\n4. If still no response, press **F8/F11** during boot to access recovery mode\n\nWould you like me to walk you through any of these steps in detail?" }, { "instruction": "You are a customer service agent for a tech company.", "input": "How do I reset my password?", "output": "Resetting your password is easy! Here's how:\n\n1. Go to the login page and click **'Forgot Password'**\n2. Enter your **registered email address**\n3. Check your email for a reset link (check spam folder too)\n4. Click the link and **create a new password**\n\nThe reset link expires in 24 hours. If you don't receive the email within 5 minutes, contact our support team." } ] EOF # Register dataset in dataset_info.json docker exec -it llamafactory bash -c " cat >> /app/LLaMA-Factory/data/dataset_info.json << 'EOF2' , \"my_dataset\": { \"file_name\": \"/root/llamafactory/data/my_dataset.json\" } EOF2 " ``` Then select `my_dataset` in the LLaMA Board Dataset dropdown. ### Example 4: DPO (Direct Preference Optimization) ```yaml ### configs/dpo_llama.yaml model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct ### Method - DPO stage: dpo do_train: true finetuning_type: lora lora_rank: 8 ### DPO-specific pref_beta: 0.1 pref_loss: sigmoid # sigmoid, hinge, ipo ### Dataset (must be preference format) dataset: dpo_en_demo template: llama3 cutoff_len: 2048 ### Output output_dir: saves/llama3-dpo logging_steps: 10 save_steps: 100 ### Train per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 5e-5 num_train_epochs: 1.0 fp16: true ``` ```bash docker exec -it llamafactory bash -c "llamafactory-cli train /configs/dpo_llama.yaml" ``` ### Example 5: Inference with Fine-Tuned Model After training, test your model: ```bash docker exec -it llamafactory bash # Interactive chat llamafactory-cli chat \ --model_name_or_path mistralai/Mistral-7B-Instruct-v0.3 \ --adapter_name_or_path /app/LLaMA-Factory/saves/mistral-qlora \ --template mistral \ --finetuning_type lora ``` Or export the merged model: ```bash llamafactory-cli export \ --model_name_or_path mistralai/Mistral-7B-Instruct-v0.3 \ --adapter_name_or_path /app/LLaMA-Factory/saves/mistral-qlora \ --template mistral \ --finetuning_type lora \ --export_dir /app/LLaMA-Factory/output/mistral-merged \ --export_size 4 \ --export_legacy_format false ``` *** ## Configuration ### Key Training Parameters | Parameter | Typical Value | Description | | ----------------------------- | ------------- | ------------------------------------ | | `lora_rank` | 8–64 | LoRA rank (higher = more expressive) | | `lora_alpha` | 2× rank | LoRA alpha scaling | | `lora_dropout` | 0.0–0.1 | Dropout for LoRA layers | | `lora_target` | `all` | Which layers to apply LoRA | | `learning_rate` | `1e-4` | Starting learning rate | | `num_train_epochs` | 1–5 | Training epochs | | `per_device_train_batch_size` | 1–4 | Batch size per GPU | | `gradient_accumulation_steps` | 4–16 | Effective batch multiplier | | `cutoff_len` | 1024–4096 | Max sequence length | | `quantization_bit` | 4 or 8 | QLoRA quantization bits | | `warmup_ratio` | 0.05–0.1 | LR warmup fraction | | `lr_scheduler_type` | `cosine` | LR schedule | ### Supported Fine-tuning Methods | Method | Memory Use | Quality | When to Use | | -------------------- | ---------- | --------- | ------------------ | | `full` | Very High | Best | Unlimited VRAM | | `freeze` | Medium | Good | Freeze base layers | | `lora` | Low | Very Good | Default choice | | `qlora` (lora+quant) | Lowest | Good | Limited VRAM | ### Multi-GPU DeepSpeed Training For training on multiple GPUs, launch with `torchrun`: ```bash docker exec -it llamafactory bash -c " FORCE_TORCHRUN=1 NNODES=1 RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=29500 \ llamafactory-cli train configs/qlora_mistral.yaml \ --deepspeed examples/deepspeed/ds_z3_config.json " ``` *** ## Performance Tips ### 1. Optimal QLoRA Settings by GPU **8 GB VRAM (RTX 3070):** ```yaml quantization_bit: 4 per_device_train_batch_size: 1 gradient_accumulation_steps: 8 cutoff_len: 1024 ``` **24 GB VRAM (RTX 3090/4090):** ```yaml quantization_bit: 4 # Still use QLoRA for larger batch size per_device_train_batch_size: 4 gradient_accumulation_steps: 4 cutoff_len: 2048 ``` **80 GB VRAM (A100):** ```yaml # No quantization needed — use LoRA directly finetuning_type: lora per_device_train_batch_size: 8 gradient_accumulation_steps: 2 cutoff_len: 4096 fp16: true ``` ### 2. Flash Attention 2 for Longer Contexts ```yaml flash_attn: fa2 # Requires Ampere+ GPU ``` This enables training with 2× longer sequences on the same VRAM. ### 3. Gradient Checkpointing Saves VRAM at the cost of \~20% slower training: ```yaml gradient_checkpointing: true ``` ### 4. Choose the Right LoRA Target ```yaml lora_target: all # All linear layers (default, best quality) # or lora_target: q_proj,v_proj # Minimal, fastest, less quality ``` ### 5. Freeze Top Layers for Fast Adaptation ```yaml finetuning_type: freeze freeze_trainable_layers: 2 # Train only top 2 layers freeze_trainable_modules: all ``` Much faster than full LoRA for simple task adaptation. ### 6. Monitor with TensorBoard ```bash # In a separate terminal docker exec -it llamafactory bash -c " tensorboard --logdir /app/LLaMA-Factory/saves --host 0.0.0.0 --port 6006 " ``` Add port 6006 to your CLORE.AI order to access TensorBoard. *** ## Troubleshooting ### Problem: "CUDA out of memory" during training 1. Reduce batch size: `per_device_train_batch_size: 1` 2. Enable gradient checkpointing: `gradient_checkpointing: true` 3. Reduce context length: `cutoff_len: 512` 4. Use QLoRA (4-bit): `quantization_bit: 4` 5. Reduce LoRA rank: `lora_rank: 4` ### Problem: Training loss not decreasing * Check learning rate — try `5e-5` or `2e-4` * Verify dataset format matches template * Increase `lora_rank` (8→16→32) * Check that `lora_target: all` is set ### Problem: Slow training speed ```bash # Check GPU utilization inside container docker exec -it llamafactory bash -c "watch -n 1 nvidia-smi" ``` If GPU is < 80% utilized: * Increase batch size * Use Flash Attention: `flash_attn: fa2` * Remove `gradient_checkpointing` if VRAM allows ### Problem: Model not found in web UI ```bash # Pre-download to cache volume docker exec -it llamafactory bash -c " huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 " ``` Then refresh the model list in LLaMA Board. ### Problem: Dataset format errors All dataset formats must match `dataset_info.json` specification: ```bash # Validate dataset docker exec -it llamafactory python3 -c " import json with open('/app/LLaMA-Factory/data/my_dataset.json') as f: data = json.load(f) print(f'Dataset has {len(data)} samples') print('First sample keys:', list(data[0].keys())) " ``` ### Problem: WebUI port not accessible Ensure LLaMA-Factory started the Gradio server: ```bash docker logs llamafactory 2>&1 | grep -E "Running on|Error|Traceback" ``` Add `--share` flag for a public Gradio URL as alternative. *** ## Links * [GitHub](https://github.com/hiyouga/LLaMA-Factory) * [Documentation](https://llamafactory.readthedocs.io) * [Docker Hub (hiyouga)](https://hub.docker.com/r/hiyouga/llamafactory) * [Supported Models](https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#supported-models) * [Dataset Format](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README.md) * [CLORE.AI Marketplace](https://clore.ai/marketplace) *** ## Clore.ai GPU Recommendations | Use Case | Recommended GPU | Est. Cost on Clore.ai | | -------------------- | --------------- | --------------------- | | Development/Testing | RTX 3090 (24GB) | \~$0.12/gpu/hr | | Fine-tuning (7B–13B) | RTX 4090 (24GB) | \~$0.70/gpu/hr | | Large Models (70B+) | A100 80GB | \~$1.20/gpu/hr | | Multi-GPU Training | 2-4x A100 80GB | \~$2.40–$4.80/hr | > 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access. --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.clore.ai/guides/training/llama-factory.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.