Model Compatibility
Quick Reference
Language Models (LLM)
Model
Parameters
Min VRAM
Recommended
Quantization
Image Generation Models
Model
Min VRAM
Recommended
Notes
Video Generation Models
Model
Min VRAM
Recommended
Output
Audio Models
Model
Min VRAM
Recommended
Task
Vision Models
Model
Min VRAM
Recommended
Task
Detailed Compatibility Tables
LLM by GPU
GPU
Max Model (Q4)
Max Model (Q8)
Max Model (FP16)
Image Generation by GPU
GPU
SD 1.5
SDXL
FLUX schnell
FLUX dev
Video Generation by GPU
GPU
SVD
AnimateDiff
Wan2.1
Hunyuan
Quantization Guide
What is Quantization?
Format
Bits
VRAM Reduction
Quality Loss
VRAM Calculator
Model Size
FP16
Q8
Q4
Recommended Quantization by Use Case
Use Case
Recommended
Why
Context Length vs VRAM
How Context Affects VRAM
Model
Default Context
Max Context
VRAM per 1K tokens
Context by GPU (Llama 3 8B Q4)
GPU
Comfortable Context
Maximum Context
Multi-GPU Configurations
Tensor Parallelism
Configuration
Total VRAM
Max Model (FP16)
vLLM Multi-GPU
Specific Model Guides
Llama 3.1 Family
Variant
Parameters
Min GPU
Recommended Setup
Mistral/Mixtral Family
Variant
Parameters
Min GPU
Recommended Setup
Qwen 2.5 Family
Variant
Parameters
Min GPU
Recommended Setup
DeepSeek Models
Variant
Parameters
Min GPU
Recommended Setup
Troubleshooting
"CUDA out of memory"
"Model too large"
"Slow generation"
Next Steps
Last updated
Was this helpful?