LitGPT

LitGPT is a high-performance library for pretraining, finetuning, and deploying 20+ large language models built on PyTorch Lightning. With 12K+ GitHub stars, it's a go-to toolkit for engineers who need clean, hackable LLM training code without the abstraction overhead of HuggingFace Transformers.

Each model in LitGPT is ~1,000 lines of clean PyTorch — no inheritance chains 10 levels deep, no magic. You can read the Llama 3 implementation end-to-end in an afternoon and modify it confidently.

circle-check

What is LitGPT?

LitGPT provides production-ready implementations of state-of-the-art LLMs with a unified training interface:

  • 20+ supported models — Llama 3, Gemma 2, Mistral, Phi-3, Falcon, StableLM, and more

  • Pretrain from scratch — full pretraining with Flash Attention, FSDP, and gradient checkpointing

  • Finetune efficiently — full finetuning, LoRA, QLoRA, and Adapter methods

  • Serve with confidence — built-in inference server with quantization

  • Multi-GPU support — DDP, FSDP, tensor parallelism out of the box

  • Memory efficient — 4-bit quantization, gradient checkpointing, activation checkpointing


Server Requirements

Component
Minimum
Recommended

GPU

RTX 3090 (24 GB)

A100 80 GB / H100

VRAM

16 GB (7B LoRA)

80 GB+ (70B full)

RAM

32 GB

64 GB+

CPU

8 cores

16+ cores

Storage

100 GB

500 GB+

OS

Ubuntu 20.04+

Ubuntu 22.04

Python

3.10+

3.11

CUDA

11.8+

12.1+

VRAM Requirements by Task

Task
Model
VRAM

Inference (4-bit)

Llama-3 8B

~6 GB

LoRA finetune

Llama-3 8B

~16 GB

Full finetune

Llama-3 8B

~80 GB

LoRA finetune

Llama-3 70B

~48 GB (2×A100)

Full finetune

Llama-3 70B

~640 GB (8×A100)

QLoRA finetune

Llama-3 8B

~8 GB


Ports

Port
Service
Notes

22

SSH

Terminal access & file transfer

8000

LitGPT Inference Server

REST API for model serving


Quick Start with Docker


Installation on Clore.ai

Step 1 — Rent a Server

  1. Filter for VRAM ≥ 24 GB (RTX 3090 or better)

  2. Choose a PyTorch or CUDA 12.1 base image

  3. Open ports 22 and 8000 in your order settings

  4. Select storage ≥ 200 GB for model weights

Step 2 — Connect via SSH

Step 3 — Install LitGPT

Step 4 — Verify Installation

Expected output:


Downloading Models

LitGPT downloads models from Hugging Face:

Set HuggingFace Token


Inference (Chat & Generate)


Finetuning

LoRA trains a small set of adapter parameters (typically 0.1–1% of total weights) while the base model stays frozen. Llama 3 8B LoRA on 10K examples takes ~2 hours on an RTX 3090 with r=16.

QLoRA (4-bit + LoRA)

Use QLoRA to finetune large models on limited VRAM. Llama 3 8B fits on a single RTX 3090 at 24 GB:

Full Finetuning

Multi-GPU Training


Serving Models (REST API)

Python Client


Pretraining from Scratch

For training a custom LLM from scratch on your own data:


Converting and Exporting Models


Evaluating Models


Clore.ai GPU Recommendations

LitGPT covers three distinct workloads — inference, LoRA finetuning, and full pretraining — each with different GPU requirements.

Workload
GPU
VRAM
Notes

Inference / chat (7–8B models)

RTX 3090

24 GB

Fits Llama 3 8B in bf16; ~95 tok/s generation

LoRA finetune (7–8B models)

RTX 3090

24 GB

Budget pick; QLoRA keeps VRAM under 10 GB

LoRA finetune (7–8B), fast iteration

RTX 4090

24 GB

~35% faster than 3090; reduces 2hr job to ~1.4hr

Full finetune (7B) or QLoRA (70B)

A100 40 GB

40 GB

40 GB fits 7B full-precision or 70B 4-bit

Full finetune (13B+) or pretrain runs

A100 80 GB

80 GB

Highest throughput; ~2,800 tok/sec training on 8B

Recommended for most users: RTX 3090 pair (2×24 GB = 48 GB effective with FSDP). Handles QLoRA on 70B models, or full finetune on 7B models with tensor parallelism. Cost on Clore.ai: ~$0.25/hr for two 3090s.

For pretraining or >70B finetuning: Use 4×A100 80GB with FSDP. LitGPT's FSDP integration handles sharding transparently — just pass --devices 4 --strategy fsdp.


Troubleshooting

CUDA Out of Memory

Download fails / HuggingFace 401

Training loss doesn't decrease

Server port 8000 not accessible

Multi-GPU training hangs


Last updated

Was this helpful?