# LLaMA-Factory

LLaMA-Factory is the most comprehensive open-source fine-tuning framework, supporting 100+ language models including all LLaMA variants, Qwen, Mistral, Phi, Falcon, ChatGLM, and more. It offers LoRA, QLoRA, full fine-tuning, RLHF, DPO, and PPO — all through an intuitive web interface (LLaMA Board) or CLI. CLORE.AI's on-demand GPU servers make it the perfect platform for launching fine-tuning jobs at fraction of the cost of cloud providers.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Server Requirements

| Parameter | Minimum          | Recommended    |
| --------- | ---------------- | -------------- |
| RAM       | 16 GB            | 32 GB+         |
| VRAM      | 8 GB (QLoRA)     | 24 GB+         |
| Disk      | 50 GB            | 200 GB+        |
| GPU       | NVIDIA RTX 2080+ | A100, RTX 4090 |

{% hint style="info" %}
**Training method determines GPU requirements:**

* **QLoRA (4-bit)**: 8 GB VRAM for 7B models, 16 GB for 13B
* **LoRA (float16)**: 16 GB VRAM for 7B models, 40 GB for 13B
* **Full fine-tuning**: \~14 GB VRAM per 7B parameter (+ optimizer states)
* Multi-GPU (DeepSpeed/FSDP) scales across any number of GPUs
  {% endhint %}

## Quick Deploy on CLORE.AI

**Docker Image:** `hiyouga/llamafactory:latest`

**Ports:** `22/tcp`, `7860/http`

**Environment Variables:**

| Variable               | Example     | Description                              |
| ---------------------- | ----------- | ---------------------------------------- |
| `HF_TOKEN`             | `hf_xxx...` | HuggingFace token for gated models       |
| `WANDB_API_KEY`        | `xxx...`    | Weights & Biases for experiment tracking |
| `CUDA_VISIBLE_DEVICES` | `0,1`       | GPUs to use                              |

## Step-by-Step Setup

### 1. Rent a GPU Server on CLORE.AI

Visit [CLORE.AI Marketplace](https://clore.ai/marketplace) and select based on your task:

| Task       | VRAM   | Recommended GPU |
| ---------- | ------ | --------------- |
| QLoRA 7B   | 8 GB   | RTX 3070/2080   |
| QLoRA 13B  | 16 GB  | RTX 3090/A4000  |
| LoRA 7B    | 16 GB  | RTX 3090/A4000  |
| LoRA 13B   | 40 GB  | A6000/A100 40GB |
| Full FT 7B | 80 GB  | A100 80GB       |
| Multi-GPU  | Varies | 2-8× any GPU    |

### 2. SSH into Your Server

```bash
ssh -p <PORT> root@<SERVER_IP>
```

### 3. Create Working Directories

```bash
mkdir -p /root/llamafactory/{data,models,output,saves}
```

### 4. Pull the Docker Image

```bash
docker pull hiyouga/llamafactory:latest
```

### 5. Launch LLaMA-Factory

**Launch with Web UI (LLaMA Board):**

```bash
docker run -d \
  --name llamafactory \
  --gpus all \
  -p 7860:7860 \
  -v /root/llamafactory/data:/app/LLaMA-Factory/data \
  -v /root/llamafactory/models:/root/.cache/huggingface \
  -v /root/llamafactory/output:/app/LLaMA-Factory/output \
  -v /root/llamafactory/saves:/app/LLaMA-Factory/saves \
  -e HF_TOKEN=hf_your_token_here \
  hiyouga/llamafactory:latest \
  llamafactory-cli webui
```

**With Weights & Biases tracking:**

```bash
docker run -d \
  --name llamafactory \
  --gpus all \
  -p 7860:7860 \
  -v /root/llamafactory/data:/app/LLaMA-Factory/data \
  -v /root/llamafactory/models:/root/.cache/huggingface \
  -v /root/llamafactory/output:/app/LLaMA-Factory/output \
  -v /root/llamafactory/saves:/app/LLaMA-Factory/saves \
  -e HF_TOKEN=hf_your_token_here \
  -e WANDB_API_KEY=your_wandb_key \
  hiyouga/llamafactory:latest \
  llamafactory-cli webui
```

**Multi-GPU with DeepSpeed (4 GPUs):**

```bash
docker run -d \
  --name llamafactory \
  --gpus all \
  --shm-size 16g \
  --ipc host \
  -p 7860:7860 \
  -v /root/llamafactory/data:/app/LLaMA-Factory/data \
  -v /root/llamafactory/models:/root/.cache/huggingface \
  -v /root/llamafactory/output:/app/LLaMA-Factory/output \
  -e CUDA_VISIBLE_DEVICES=0,1,2,3 \
  hiyouga/llamafactory:latest \
  bash -c "llamafactory-cli webui"
```

### 6. Access the Web Interface

Check logs and get the URL:

```bash
docker logs -f llamafactory
```

Your CLORE.AI http\_pub URL for port 7860:

```
https://<order-id>-7860.clore.ai/
```

***

## Usage Examples

### Example 1: LoRA Fine-Tuning via Web UI (LLaMA Board)

1. Open LLaMA Board at your CLORE.AI URL
2. Go to the **Train** tab
3. Configure:
   * **Model Name**: `LLaMA-3` → `Meta-Llama-3-8B-Instruct`
   * **Training Stage**: `Supervised Fine-Tuning`
   * **Dataset**: Select your dataset (or upload custom)
   * **Fine-tuning method**: `lora`
   * **LoRA rank**: `8` (higher = more parameters trained)
   * **Learning rate**: `1e-4`
   * **Epochs**: `3`
   * **Output dir**: `llama3-finetuned`
4. Click **Start** to begin training
5. Monitor loss curves in the **Loss** chart

### Example 2: CLI-Based QLoRA Fine-Tuning

Prepare a training config YAML:

```bash
docker exec -it llamafactory bash

cat > /app/LLaMA-Factory/configs/qlora_mistral.yaml << 'EOF'
### Model
model_name_or_path: mistralai/Mistral-7B-Instruct-v0.3

### Method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05

### Dataset
dataset: alpaca_en
template: mistral
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### Output
output_dir: saves/mistral-qlora
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### Train
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 0.0001
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true

### Quantization
quantization_method: bitsandbytes
quantization_bit: 4
EOF

# Run training
llamafactory-cli train /app/LLaMA-Factory/configs/qlora_mistral.yaml
```

### Example 3: Upload Custom Dataset

Create a custom dataset in Alpaca format:

```bash
# Create dataset file
cat > /root/llamafactory/data/my_dataset.json << 'EOF'
[
  {
    "instruction": "You are a customer service agent for a tech company. Answer helpfully.",
    "input": "My laptop won't turn on after the update. What should I do?",
    "output": "I understand how frustrating that can be. Let's troubleshoot step by step:\n\n1. **Hold power button for 10 seconds** to force shutdown\n2. **Remove power cable** and battery if removable, wait 30 seconds\n3. **Reconnect power** and try turning on\n4. If still no response, press **F8/F11** during boot to access recovery mode\n\nWould you like me to walk you through any of these steps in detail?"
  },
  {
    "instruction": "You are a customer service agent for a tech company.",
    "input": "How do I reset my password?",
    "output": "Resetting your password is easy! Here's how:\n\n1. Go to the login page and click **'Forgot Password'**\n2. Enter your **registered email address**\n3. Check your email for a reset link (check spam folder too)\n4. Click the link and **create a new password**\n\nThe reset link expires in 24 hours. If you don't receive the email within 5 minutes, contact our support team."
  }
]
EOF

# Register dataset in dataset_info.json
docker exec -it llamafactory bash -c "
cat >> /app/LLaMA-Factory/data/dataset_info.json << 'EOF2'
,
\"my_dataset\": {
  \"file_name\": \"/root/llamafactory/data/my_dataset.json\"
}
EOF2
"
```

Then select `my_dataset` in the LLaMA Board Dataset dropdown.

### Example 4: DPO (Direct Preference Optimization)

```yaml
### configs/dpo_llama.yaml

model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct

### Method - DPO
stage: dpo
do_train: true
finetuning_type: lora
lora_rank: 8

### DPO-specific
pref_beta: 0.1
pref_loss: sigmoid  # sigmoid, hinge, ipo

### Dataset (must be preference format)
dataset: dpo_en_demo
template: llama3
cutoff_len: 2048

### Output
output_dir: saves/llama3-dpo
logging_steps: 10
save_steps: 100

### Train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5e-5
num_train_epochs: 1.0
fp16: true
```

```bash
docker exec -it llamafactory bash -c "llamafactory-cli train /configs/dpo_llama.yaml"
```

### Example 5: Inference with Fine-Tuned Model

After training, test your model:

```bash
docker exec -it llamafactory bash

# Interactive chat
llamafactory-cli chat \
  --model_name_or_path mistralai/Mistral-7B-Instruct-v0.3 \
  --adapter_name_or_path /app/LLaMA-Factory/saves/mistral-qlora \
  --template mistral \
  --finetuning_type lora
```

Or export the merged model:

```bash
llamafactory-cli export \
  --model_name_or_path mistralai/Mistral-7B-Instruct-v0.3 \
  --adapter_name_or_path /app/LLaMA-Factory/saves/mistral-qlora \
  --template mistral \
  --finetuning_type lora \
  --export_dir /app/LLaMA-Factory/output/mistral-merged \
  --export_size 4 \
  --export_legacy_format false
```

***

## Configuration

### Key Training Parameters

| Parameter                     | Typical Value | Description                          |
| ----------------------------- | ------------- | ------------------------------------ |
| `lora_rank`                   | 8–64          | LoRA rank (higher = more expressive) |
| `lora_alpha`                  | 2× rank       | LoRA alpha scaling                   |
| `lora_dropout`                | 0.0–0.1       | Dropout for LoRA layers              |
| `lora_target`                 | `all`         | Which layers to apply LoRA           |
| `learning_rate`               | `1e-4`        | Starting learning rate               |
| `num_train_epochs`            | 1–5           | Training epochs                      |
| `per_device_train_batch_size` | 1–4           | Batch size per GPU                   |
| `gradient_accumulation_steps` | 4–16          | Effective batch multiplier           |
| `cutoff_len`                  | 1024–4096     | Max sequence length                  |
| `quantization_bit`            | 4 or 8        | QLoRA quantization bits              |
| `warmup_ratio`                | 0.05–0.1      | LR warmup fraction                   |
| `lr_scheduler_type`           | `cosine`      | LR schedule                          |

### Supported Fine-tuning Methods

| Method               | Memory Use | Quality   | When to Use        |
| -------------------- | ---------- | --------- | ------------------ |
| `full`               | Very High  | Best      | Unlimited VRAM     |
| `freeze`             | Medium     | Good      | Freeze base layers |
| `lora`               | Low        | Very Good | Default choice     |
| `qlora` (lora+quant) | Lowest     | Good      | Limited VRAM       |

### Multi-GPU DeepSpeed Training

For training on multiple GPUs, launch with `torchrun`:

```bash
docker exec -it llamafactory bash -c "
FORCE_TORCHRUN=1 NNODES=1 RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=29500 \
llamafactory-cli train configs/qlora_mistral.yaml \
  --deepspeed examples/deepspeed/ds_z3_config.json
"
```

***

## Performance Tips

### 1. Optimal QLoRA Settings by GPU

**8 GB VRAM (RTX 3070):**

```yaml
quantization_bit: 4
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
cutoff_len: 1024
```

**24 GB VRAM (RTX 3090/4090):**

```yaml
quantization_bit: 4  # Still use QLoRA for larger batch size
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
cutoff_len: 2048
```

**80 GB VRAM (A100):**

```yaml
# No quantization needed — use LoRA directly
finetuning_type: lora
per_device_train_batch_size: 8
gradient_accumulation_steps: 2
cutoff_len: 4096
fp16: true
```

### 2. Flash Attention 2 for Longer Contexts

```yaml
flash_attn: fa2  # Requires Ampere+ GPU
```

This enables training with 2× longer sequences on the same VRAM.

### 3. Gradient Checkpointing

Saves VRAM at the cost of \~20% slower training:

```yaml
gradient_checkpointing: true
```

### 4. Choose the Right LoRA Target

```yaml
lora_target: all  # All linear layers (default, best quality)
# or
lora_target: q_proj,v_proj  # Minimal, fastest, less quality
```

### 5. Freeze Top Layers for Fast Adaptation

```yaml
finetuning_type: freeze
freeze_trainable_layers: 2   # Train only top 2 layers
freeze_trainable_modules: all
```

Much faster than full LoRA for simple task adaptation.

### 6. Monitor with TensorBoard

```bash
# In a separate terminal
docker exec -it llamafactory bash -c "
tensorboard --logdir /app/LLaMA-Factory/saves --host 0.0.0.0 --port 6006
"
```

Add port 6006 to your CLORE.AI order to access TensorBoard.

***

## Troubleshooting

### Problem: "CUDA out of memory" during training

1. Reduce batch size: `per_device_train_batch_size: 1`
2. Enable gradient checkpointing: `gradient_checkpointing: true`
3. Reduce context length: `cutoff_len: 512`
4. Use QLoRA (4-bit): `quantization_bit: 4`
5. Reduce LoRA rank: `lora_rank: 4`

### Problem: Training loss not decreasing

* Check learning rate — try `5e-5` or `2e-4`
* Verify dataset format matches template
* Increase `lora_rank` (8→16→32)
* Check that `lora_target: all` is set

### Problem: Slow training speed

```bash
# Check GPU utilization inside container
docker exec -it llamafactory bash -c "watch -n 1 nvidia-smi"
```

If GPU is < 80% utilized:

* Increase batch size
* Use Flash Attention: `flash_attn: fa2`
* Remove `gradient_checkpointing` if VRAM allows

### Problem: Model not found in web UI

```bash
# Pre-download to cache volume
docker exec -it llamafactory bash -c "
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3
"
```

Then refresh the model list in LLaMA Board.

### Problem: Dataset format errors

All dataset formats must match `dataset_info.json` specification:

```bash
# Validate dataset
docker exec -it llamafactory python3 -c "
import json
with open('/app/LLaMA-Factory/data/my_dataset.json') as f:
    data = json.load(f)
print(f'Dataset has {len(data)} samples')
print('First sample keys:', list(data[0].keys()))
"
```

### Problem: WebUI port not accessible

Ensure LLaMA-Factory started the Gradio server:

```bash
docker logs llamafactory 2>&1 | grep -E "Running on|Error|Traceback"
```

Add `--share` flag for a public Gradio URL as alternative.

***

## Links

* [GitHub](https://github.com/hiyouga/LLaMA-Factory)
* [Documentation](https://llamafactory.readthedocs.io)
* [Docker Hub (hiyouga)](https://hub.docker.com/r/hiyouga/llamafactory)
* [Supported Models](https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#supported-models)
* [Dataset Format](https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README.md)
* [CLORE.AI Marketplace](https://clore.ai/marketplace)

***

## Clore.ai GPU Recommendations

| Use Case             | Recommended GPU | Est. Cost on Clore.ai |
| -------------------- | --------------- | --------------------- |
| Development/Testing  | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| Fine-tuning (7B–13B) | RTX 4090 (24GB) | \~$0.70/gpu/hr        |
| Large Models (70B+)  | A100 80GB       | \~$1.20/gpu/hr        |
| Multi-GPU Training   | 2-4x A100 80GB  | \~$2.40–$4.80/hr      |

> 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/training/llama-factory.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
