Fine-tune LLM

Train your own custom LLM using efficient fine-tuning techniques on CLORE.AI GPUs.

circle-check

Renting on CLORE.AI

  1. Filter by GPU type, VRAM, and price

  2. Choose On-Demand (fixed rate) or Spot (bid price)

  3. Configure your order:

    • Select Docker image

    • Set ports (TCP for SSH, HTTP for web UIs)

    • Add environment variables if needed

    • Enter startup command

  4. Select payment: CLORE, BTC, or USDT/USDC

  5. Create order and wait for deployment

Access Your Server

  • Find connection details in My Orders

  • Web interfaces: Use the HTTP port URL

  • SSH: ssh -p <port> root@<proxy-address>

What is LoRA/QLoRA?

  • LoRA (Low-Rank Adaptation) - Train small adapter layers instead of full model

  • QLoRA - LoRA with quantization for even less VRAM

  • Train 7B model on single RTX 3090

  • Train 70B model on single A100

Requirements

Model
Method
Min VRAM
Recommended

7B

QLoRA

12GB

RTX 3090

13B

QLoRA

20GB

RTX 4090

70B

QLoRA

48GB

A100 80GB

7B

Full LoRA

24GB

RTX 4090

Quick Deploy

Docker Image:

Ports:

Command:

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Dataset Preparation

Instruction Format

Alpaca Format

QLoRA Fine-tuning Script

Using Axolotl (Easier)

Axolotl simplifies fine-tuning with YAML configs:

Axolotl Config Examples

Chat Model

Code Model

Merging LoRA Weights

After training, merge LoRA back into base model:

Convert to GGUF

For use with llama.cpp/Ollama:

Monitoring Training

Weights & Biases

TensorBoard

Best Practices

Hyperparameters

Parameter
7B Model
13B Model
70B Model

batch_size

4

2

1

grad_accum

4

8

16

lr

2e-4

1e-4

5e-5

lora_r

64

32

16

epochs

3

2-3

1-2

Dataset Size

  • Minimum: 1,000 examples

  • Good: 10,000+ examples

  • Quality > Quantity

Avoiding Overfitting

Multi-GPU Training

DeepSpeed config:

Saving & Exporting

Troubleshooting

OOM Errors

  • Reduce batch size

  • Increase gradient accumulation

  • Use gradient_checkpointing=True

  • Reduce lora_r

Training Loss Not Decreasing

  • Check data format

  • Increase learning rate

  • Check for data issues

NaN Loss

  • Reduce learning rate

  • Use fp32 instead of fp16

  • Check for corrupted data

Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

GPU
Hourly Rate
Daily Rate
4-Hour Session

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Prices vary by provider and demand. Check CLORE.AI Marketplacearrow-up-right for current rates.

Save money:

  • Use Spot market for flexible workloads (often 30-50% cheaper)

  • Pay with CLORE tokens

  • Compare prices across different providers

Last updated

Was this helpful?