# GPU Comparison

Complete comparison of GPUs available on CLORE.AI for AI workloads.

{% hint style="success" %}
Find the right GPU for your task at [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Quick Recommendation

| Your Task                 | Budget Pick   | Best Value    | Maximum Performance |
| ------------------------- | ------------- | ------------- | ------------------- |
| Chat with AI (7B)         | RTX 3060 12GB | RTX 3090 24GB | RTX 5090 32GB       |
| Chat with AI (70B)        | RTX 3090 24GB | RTX 5090 32GB | A100 80GB           |
| Image Generation (SD 1.5) | RTX 3060 12GB | RTX 3090 24GB | RTX 5090 32GB       |
| Image Generation (SDXL)   | RTX 3090 24GB | RTX 4090 24GB | RTX 5090 32GB       |
| Image Generation (FLUX)   | RTX 3090 24GB | RTX 5090 32GB | A100 80GB           |
| Video Generation          | RTX 4090 24GB | RTX 5090 32GB | A100 80GB           |
| Model Training            | A100 40GB     | A100 80GB     | H100 80GB           |

## Consumer GPUs

### NVIDIA RTX 3060 12GB

**Best for:** Budget AI, SD 1.5, small LLMs

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 12GB GDDR6    |
| Memory Bandwidth | 360 GB/s      |
| FP16 Performance | 12.7 TFLOPS   |
| Tensor Cores     | 112 (3rd gen) |
| TDP              | 170W          |
| \~Price/hour     | $0.02-0.04    |

**Capabilities:**

* ✅ Ollama with 7B models (Q4)
* ✅ Stable Diffusion 1.5 (512x512)
* ✅ SDXL (768x768, slow)
* ⚠️ FLUX schnell (with CPU offload)
* ❌ Large models (>13B)
* ❌ Video generation

***

### NVIDIA RTX 3070/3070 Ti 8GB

**Best for:** SD 1.5, lightweight tasks

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 8GB GDDR6X    |
| Memory Bandwidth | 448-608 GB/s  |
| FP16 Performance | 20.3 TFLOPS   |
| Tensor Cores     | 184 (3rd gen) |
| TDP              | 220-290W      |
| \~Price/hour     | $0.02-0.04    |

**Capabilities:**

* ✅ Ollama with 7B models (Q4)
* ✅ Stable Diffusion 1.5 (512x512)
* ⚠️ SDXL (low resolution only)
* ❌ FLUX (insufficient VRAM)
* ❌ Models >7B
* ❌ Video generation

***

### NVIDIA RTX 3080/3080 Ti 10-12GB

**Best for:** General AI tasks, good balance

| Spec             | Value             |
| ---------------- | ----------------- |
| VRAM             | 10-12GB GDDR6X    |
| Memory Bandwidth | 760-912 GB/s      |
| FP16 Performance | 29.8-34.1 TFLOPS  |
| Tensor Cores     | 272-320 (3rd gen) |
| TDP              | 320-350W          |
| \~Price/hour     | $0.04-0.06        |

**Capabilities:**

* ✅ Ollama with 13B models
* ✅ Stable Diffusion 1.5/2.1
* ✅ SDXL (1024x1024)
* ⚠️ FLUX schnell (with offload)
* ❌ Large models (>13B)
* ❌ Video generation

***

### NVIDIA RTX 3090/3090 Ti 24GB

**Best for:** SDXL, 13B-30B LLMs, ControlNet

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 24GB GDDR6X   |
| Memory Bandwidth | 936 GB/s      |
| FP16 Performance | 35.6 TFLOPS   |
| Tensor Cores     | 328 (3rd gen) |
| TDP              | 350-450W      |
| \~Price/hour     | $0.05-0.08    |

**Capabilities:**

* ✅ Ollama with 30B models
* ✅ vLLM with 13B models
* ✅ All Stable Diffusion models
* ✅ SDXL + ControlNet
* ✅ FLUX schnell (1024x1024)
* ⚠️ FLUX dev (with offload)
* ⚠️ Video (short clips)

***

### NVIDIA RTX 4070 Ti 12GB

**Best for:** Fast SD 1.5, efficient inference

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 12GB GDDR6X   |
| Memory Bandwidth | 504 GB/s      |
| FP16 Performance | 40.1 TFLOPS   |
| Tensor Cores     | 184 (4th gen) |
| TDP              | 285W          |
| \~Price/hour     | $0.04-0.06    |

**Capabilities:**

* ✅ Ollama with 7B models (fast)
* ✅ Stable Diffusion 1.5 (very fast)
* ✅ SDXL (768x768)
* ⚠️ FLUX schnell (limited res)
* ❌ Large models (>13B)
* ❌ Video generation

***

### NVIDIA RTX 4080 16GB

**Best for:** SDXL production, 13B LLMs

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 16GB GDDR6X   |
| Memory Bandwidth | 717 GB/s      |
| FP16 Performance | 48.7 TFLOPS   |
| Tensor Cores     | 304 (4th gen) |
| TDP              | 320W          |
| \~Price/hour     | $0.06-0.09    |

**Capabilities:**

* ✅ Ollama with 13B models (fast)
* ✅ vLLM with 7B models
* ✅ All Stable Diffusion models
* ✅ SDXL + ControlNet
* ✅ FLUX schnell (1024x1024)
* ⚠️ FLUX dev (limited)
* ⚠️ Short video clips

***

### NVIDIA RTX 4090 24GB

**Best for:** High-end consumer performance, FLUX, video

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 24GB GDDR6X   |
| Memory Bandwidth | 1008 GB/s     |
| FP16 Performance | 82.6 TFLOPS   |
| Tensor Cores     | 512 (4th gen) |
| TDP              | 450W          |
| \~Price/hour     | $0.08-0.12    |

**Capabilities:**

* ✅ Ollama with 30B models (fast)
* ✅ vLLM with 13B models
* ✅ All image generation models
* ✅ FLUX dev (1024x1024)
* ✅ Video generation (short)
* ✅ AnimateDiff
* ⚠️ 70B models (Q4 only)

***

### NVIDIA RTX 5080 16GB *(New — Feb 2025)*

**Best for:** Fast SDXL/FLUX, 13B-30B LLMs, high-performance mid-range

| Spec                  | Value         |
| --------------------- | ------------- |
| VRAM                  | 16GB GDDR7    |
| Memory Bandwidth      | 960 GB/s      |
| FP16 Performance      | \~80 TFLOPS   |
| Tensor Cores          | 336 (5th gen) |
| TDP                   | 360W          |
| \~Clore.ai Price/hour | $1.50-2.00    |

**Capabilities:**

* ✅ Ollama with 13B models (fast)
* ✅ vLLM with 13B models
* ✅ All Stable Diffusion models
* ✅ SDXL + ControlNet (very fast)
* ✅ FLUX schnell/dev (1024x1024)
* ✅ Short video clips
* ⚠️ 30B models (Q4 only)
* ❌ 70B models

***

### NVIDIA RTX 5090 32GB *(Flagship — Feb 2025)*

**Best for:** Maximum consumer performance, 70B models, high-res video generation

| Spec                  | Value         |
| --------------------- | ------------- |
| VRAM                  | 32GB GDDR7    |
| Memory Bandwidth      | 1792 GB/s     |
| FP16 Performance      | \~120 TFLOPS  |
| Tensor Cores          | 680 (5th gen) |
| TDP                   | 575W          |
| \~Clore.ai Price/hour | $3.00-4.00    |

**Capabilities:**

* ✅ Ollama with 70B models (Q4, fast)
* ✅ vLLM with 30B models
* ✅ All image generation models
* ✅ FLUX dev (1536x1536)
* ✅ Video generation (longer clips)
* ✅ AnimateDiff + ControlNet
* ✅ Model training (LoRA, small fine-tunes)
* ✅ DeepSeek-R1 32B distill (FP16)

## Professional/Datacenter GPUs

### NVIDIA A100 40GB

**Best for:** Production LLMs, training, large models

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 40GB HBM2e    |
| Memory Bandwidth | 1555 GB/s     |
| FP16 Performance | 77.97 TFLOPS  |
| Tensor Cores     | 432 (3rd gen) |
| TDP              | 400W          |
| \~Price/hour     | $0.15-0.20    |

**Capabilities:**

* ✅ Ollama with 70B models (Q4)
* ✅ vLLM production serving
* ✅ All image generation
* ✅ FLUX dev (high quality)
* ✅ Video generation
* ✅ Model fine-tuning
* ⚠️ 70B FP16 (tight)

***

### NVIDIA A100 80GB

**Best for:** 70B+ models, video, production workloads

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 80GB HBM2e    |
| Memory Bandwidth | 2039 GB/s     |
| FP16 Performance | 77.97 TFLOPS  |
| Tensor Cores     | 432 (3rd gen) |
| TDP              | 400W          |
| \~Price/hour     | $0.20-0.30    |

**Capabilities:**

* ✅ All LLMs up to 70B (FP16)
* ✅ vLLM high-throughput serving
* ✅ All image generation
* ✅ Long video generation
* ✅ Model training
* ✅ DeepSeek-V3 (partial)
* ⚠️ 100B+ models

***

### NVIDIA H100 80GB

**Best for:** Maximum performance, largest models

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 80GB HBM3     |
| Memory Bandwidth | 3350 GB/s     |
| FP16 Performance | 267 TFLOPS    |
| Tensor Cores     | 528 (4th gen) |
| TDP              | 700W          |
| \~Price/hour     | $0.40-0.60    |

**Capabilities:**

* ✅ All models with maximum speed
* ✅ 100B+ parameter models
* ✅ Multi-model serving
* ✅ Large-scale training
* ✅ Real-time video generation
* ✅ DeepSeek-V3 (671B)

## Performance Comparisons

### LLM Inference (tokens/second)

| GPU           | Llama 3 8B | Llama 3 70B | Mixtral 8x7B | Clore.ai $/hr |
| ------------- | ---------- | ----------- | ------------ | ------------- |
| RTX 3060 12GB | 25         | -           | -            | $0.02-0.04    |
| RTX 3090 24GB | 45         | 8\*         | 20\*         | $0.15-0.25    |
| RTX 4090 24GB | 80         | 15\*        | 35\*         | $0.35-0.55    |
| RTX 5080 16GB | 95         | -           | 40\*         | $1.50-2.00    |
| RTX 5090 32GB | 150        | 30\*        | 65\*         | $3.00-4.00    |
| A100 40GB     | 100        | 25          | 45           | $0.80-1.20    |
| A100 80GB     | 110        | 40          | 55           | $1.20-1.80    |
| H100 80GB     | 180        | 70          | 90           | $2.50-3.50    |

\*With quantization (Q4/Q8)

### Image Generation Speed

| GPU           | SD 1.5 (512) | SDXL (1024) | FLUX schnell | Clore.ai $/hr |
| ------------- | ------------ | ----------- | ------------ | ------------- |
| RTX 3060 12GB | 4 sec        | 15 sec      | 25 sec\*     | $0.02-0.04    |
| RTX 3090 24GB | 2 sec        | 7 sec       | 12 sec       | $0.15-0.25    |
| RTX 4090 24GB | 1 sec        | 3 sec       | 5 sec        | $0.35-0.55    |
| RTX 5080 16GB | 0.8 sec      | 2.5 sec     | 4 sec        | $1.50-2.00    |
| RTX 5090 32GB | 0.6 sec      | 1.8 sec     | 3 sec        | $3.00-4.00    |
| A100 40GB     | 1.5 sec      | 4 sec       | 6 sec        | $0.80-1.20    |
| A100 80GB     | 1.5 sec      | 4 sec       | 5 sec        | $1.20-1.80    |

\*With CPU offload, lower resolution

### Video Generation (5 sec clip)

| GPU           | SVD     | Wan2.1  | Hunyuan |
| ------------- | ------- | ------- | ------- |
| RTX 3090 24GB | 3 min   | 5 min\* | -       |
| RTX 4090 24GB | 1.5 min | 3 min   | 8 min\* |
| RTX 5090 32GB | 1 min   | 2 min   | 5 min   |
| A100 40GB     | 1 min   | 2 min   | 5 min   |
| A100 80GB     | 45 sec  | 1.5 min | 3 min   |

\*Limited resolution

## Price/Performance Ratio

### Best Value by Task

**Chat/LLM (7B-13B models):**

1. 🥇 RTX 3090 24GB - Best price/performance
2. 🥈 RTX 3060 12GB - Lowest cost
3. 🥉 RTX 4090 24GB - Fastest

**Image Generation (SDXL/FLUX):**

1. 🥇 RTX 3090 24GB - Great balance
2. 🥈 RTX 4090 24GB - 2x faster
3. 🥉 A100 40GB - Production stability

**Large Models (70B+):**

1. 🥇 A100 40GB - Best value for 70B
2. 🥈 A100 80GB - Full precision
3. 🥉 RTX 4090 24GB - Budget option (Q4 only)

**Video Generation:**

1. 🥇 A100 40GB - Good balance
2. 🥈 RTX 4090 24GB - Consumer option
3. 🥉 A100 80GB - Longest clips

**Model Training:**

1. 🥇 A100 40GB - Standard choice
2. 🥈 A100 80GB - Large models
3. 🥉 RTX 4090 24GB - Small models/LoRA

## Multi-GPU Configurations

Some tasks benefit from multiple GPUs:

| Configuration | Use Case                | VRAM Total |
| ------------- | ----------------------- | ---------- |
| 2x RTX 3090   | 70B inference           | 48GB       |
| 2x RTX 4090   | Fast 70B, training      | 48GB       |
| 2x RTX 5090   | 70B FP16, fast training | 64GB       |
| 4x RTX 5090   | 100B+ models            | 128GB      |
| 4x A100 40GB  | 100B+ models            | 160GB      |
| 8x A100 80GB  | DeepSeek-V3, Llama 405B | 640GB      |

## Choosing Your GPU

### Decision Flowchart

```
What's your main task?
│
├─ Chat/LLM
│  ├─ Model size?
│  │  ├─ ≤7B → RTX 3060 ($0.15–0.30/day)
│  │  ├─ 7B-30B → RTX 3090 ($0.30–1.00/day)
│  │  ├─ 30B-70B → A100 40GB ($1.50–3.00/day)
│  │  └─ 70B+ → A100 80GB ($2.00–4.00/day)
│
├─ Image Generation
│  ├─ Model?
│  │  ├─ SD 1.5 → RTX 3060 ($0.15–0.30/day)
│  │  ├─ SDXL → RTX 3090 ($0.30–1.00/day)
│  │  └─ FLUX → RTX 4090 ($0.50–2.00/day)
│
├─ Video Generation
│  ├─ Length?
│  │  ├─ Short (2-5 sec) → RTX 4090 ($0.50–2.00/day)
│  │  └─ Longer → A100 40GB+ ($1.50–3.00+/day)
│
└─ Training
   ├─ LoRA/small → RTX 4090 ($0.50–2.00/day)
   └─ Full fine-tune → A100 40GB+ ($1.50–3.00+/day)
```

## Tips for Saving Money

1. **Use Spot Orders** - 30-50% cheaper than on-demand
2. **Start Small** - Test on cheaper GPUs first
3. **Quantize Models** - Q4/Q8 fits larger models in less VRAM
4. **Batch Processing** - Process multiple requests at once
5. **Off-peak Hours** - Better availability and sometimes lower prices

> 📚 See also: [Top 10 Cheapest GPUs for AI Training in 2025](https://blog.clore.ai/top-10-cheapest-gpus-for-ai-training/) | [Best GPU for AI Training — Detailed Guide](https://blog.clore.ai/best-gpu-for-ai-training/)

## Next Steps

* [Model Compatibility Matrix](https://docs.clore.ai/guides/getting-started/model-compatibility) - Which models run on which GPUs
* [Docker Images Catalog](https://docs.clore.ai/guides/getting-started/docker-images) - Ready-to-use images
* [Quickstart Guide](https://docs.clore.ai/guides/quickstart) - Get started in 5 minutes
