# GPU Comparison

Complete comparison of GPUs available on CLORE.AI for AI workloads.

{% hint style="success" %}
Find the right GPU for your task at [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Quick Recommendation

| Your Task                 | Budget Pick   | Best Value    | Maximum Performance |
| ------------------------- | ------------- | ------------- | ------------------- |
| Chat with AI (7B)         | RTX 3060 12GB | RTX 3090 24GB | RTX 5090 32GB       |
| Chat with AI (70B)        | RTX 3090 24GB | RTX 5090 32GB | A100 80GB           |
| Image Generation (SD 1.5) | RTX 3060 12GB | RTX 3090 24GB | RTX 5090 32GB       |
| Image Generation (SDXL)   | RTX 3090 24GB | RTX 4090 24GB | RTX 5090 32GB       |
| Image Generation (FLUX)   | RTX 3090 24GB | RTX 5090 32GB | A100 80GB           |
| Video Generation          | RTX 4090 24GB | RTX 5090 32GB | A100 80GB           |
| Model Training            | A100 40GB     | A100 80GB     | H100 80GB           |

## Consumer GPUs

### NVIDIA RTX 3060 12GB

**Best for:** Budget AI, SD 1.5, small LLMs

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 12GB GDDR6    |
| Memory Bandwidth | 360 GB/s      |
| FP16 Performance | 12.7 TFLOPS   |
| Tensor Cores     | 112 (3rd gen) |
| TDP              | 170W          |
| \~Price/hour     | $0.02-0.04    |

**Capabilities:**

* ✅ Ollama with 7B models (Q4)
* ✅ Stable Diffusion 1.5 (512x512)
* ✅ SDXL (768x768, slow)
* ⚠️ FLUX schnell (with CPU offload)
* ❌ Large models (>13B)
* ❌ Video generation

***

### NVIDIA RTX 3070/3070 Ti 8GB

**Best for:** SD 1.5, lightweight tasks

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 8GB GDDR6X    |
| Memory Bandwidth | 448-608 GB/s  |
| FP16 Performance | 20.3 TFLOPS   |
| Tensor Cores     | 184 (3rd gen) |
| TDP              | 220-290W      |
| \~Price/hour     | $0.02-0.04    |

**Capabilities:**

* ✅ Ollama with 7B models (Q4)
* ✅ Stable Diffusion 1.5 (512x512)
* ⚠️ SDXL (low resolution only)
* ❌ FLUX (insufficient VRAM)
* ❌ Models >7B
* ❌ Video generation

***

### NVIDIA RTX 3080/3080 Ti 10-12GB

**Best for:** General AI tasks, good balance

| Spec             | Value             |
| ---------------- | ----------------- |
| VRAM             | 10-12GB GDDR6X    |
| Memory Bandwidth | 760-912 GB/s      |
| FP16 Performance | 29.8-34.1 TFLOPS  |
| Tensor Cores     | 272-320 (3rd gen) |
| TDP              | 320-350W          |
| \~Price/hour     | $0.04-0.06        |

**Capabilities:**

* ✅ Ollama with 13B models
* ✅ Stable Diffusion 1.5/2.1
* ✅ SDXL (1024x1024)
* ⚠️ FLUX schnell (with offload)
* ❌ Large models (>13B)
* ❌ Video generation

***

### NVIDIA RTX 3090/3090 Ti 24GB

**Best for:** SDXL, 13B-30B LLMs, ControlNet

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 24GB GDDR6X   |
| Memory Bandwidth | 936 GB/s      |
| FP16 Performance | 35.6 TFLOPS   |
| Tensor Cores     | 328 (3rd gen) |
| TDP              | 350-450W      |
| \~Price/hour     | $0.05-0.08    |

**Capabilities:**

* ✅ Ollama with 30B models
* ✅ vLLM with 13B models
* ✅ All Stable Diffusion models
* ✅ SDXL + ControlNet
* ✅ FLUX schnell (1024x1024)
* ⚠️ FLUX dev (with offload)
* ⚠️ Video (short clips)

***

### NVIDIA RTX 4070 Ti 12GB

**Best for:** Fast SD 1.5, efficient inference

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 12GB GDDR6X   |
| Memory Bandwidth | 504 GB/s      |
| FP16 Performance | 40.1 TFLOPS   |
| Tensor Cores     | 184 (4th gen) |
| TDP              | 285W          |
| \~Price/hour     | $0.04-0.06    |

**Capabilities:**

* ✅ Ollama with 7B models (fast)
* ✅ Stable Diffusion 1.5 (very fast)
* ✅ SDXL (768x768)
* ⚠️ FLUX schnell (limited res)
* ❌ Large models (>13B)
* ❌ Video generation

***

### NVIDIA RTX 4080 16GB

**Best for:** SDXL production, 13B LLMs

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 16GB GDDR6X   |
| Memory Bandwidth | 717 GB/s      |
| FP16 Performance | 48.7 TFLOPS   |
| Tensor Cores     | 304 (4th gen) |
| TDP              | 320W          |
| \~Price/hour     | $0.06-0.09    |

**Capabilities:**

* ✅ Ollama with 13B models (fast)
* ✅ vLLM with 7B models
* ✅ All Stable Diffusion models
* ✅ SDXL + ControlNet
* ✅ FLUX schnell (1024x1024)
* ⚠️ FLUX dev (limited)
* ⚠️ Short video clips

***

### NVIDIA RTX 4090 24GB

**Best for:** High-end consumer performance, FLUX, video

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 24GB GDDR6X   |
| Memory Bandwidth | 1008 GB/s     |
| FP16 Performance | 82.6 TFLOPS   |
| Tensor Cores     | 512 (4th gen) |
| TDP              | 450W          |
| \~Price/hour     | $0.08-0.12    |

**Capabilities:**

* ✅ Ollama with 30B models (fast)
* ✅ vLLM with 13B models
* ✅ All image generation models
* ✅ FLUX dev (1024x1024)
* ✅ Video generation (short)
* ✅ AnimateDiff
* ⚠️ 70B models (Q4 only)

***

### NVIDIA RTX 5080 16GB *(New — Feb 2025)*

**Best for:** Fast SDXL/FLUX, 13B-30B LLMs, high-performance mid-range

| Spec                  | Value         |
| --------------------- | ------------- |
| VRAM                  | 16GB GDDR7    |
| Memory Bandwidth      | 960 GB/s      |
| FP16 Performance      | \~80 TFLOPS   |
| Tensor Cores          | 336 (5th gen) |
| TDP                   | 360W          |
| \~Clore.ai Price/hour | $1.50-2.00    |

**Capabilities:**

* ✅ Ollama with 13B models (fast)
* ✅ vLLM with 13B models
* ✅ All Stable Diffusion models
* ✅ SDXL + ControlNet (very fast)
* ✅ FLUX schnell/dev (1024x1024)
* ✅ Short video clips
* ⚠️ 30B models (Q4 only)
* ❌ 70B models

***

### NVIDIA RTX 5090 32GB *(Flagship — Feb 2025)*

**Best for:** Maximum consumer performance, 70B models, high-res video generation

| Spec                  | Value         |
| --------------------- | ------------- |
| VRAM                  | 32GB GDDR7    |
| Memory Bandwidth      | 1792 GB/s     |
| FP16 Performance      | \~120 TFLOPS  |
| Tensor Cores          | 680 (5th gen) |
| TDP                   | 575W          |
| \~Clore.ai Price/hour | $3.00-4.00    |

**Capabilities:**

* ✅ Ollama with 70B models (Q4, fast)
* ✅ vLLM with 30B models
* ✅ All image generation models
* ✅ FLUX dev (1536x1536)
* ✅ Video generation (longer clips)
* ✅ AnimateDiff + ControlNet
* ✅ Model training (LoRA, small fine-tunes)
* ✅ DeepSeek-R1 32B distill (FP16)

## Professional/Datacenter GPUs

### NVIDIA A100 40GB

**Best for:** Production LLMs, training, large models

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 40GB HBM2e    |
| Memory Bandwidth | 1555 GB/s     |
| FP16 Performance | 77.97 TFLOPS  |
| Tensor Cores     | 432 (3rd gen) |
| TDP              | 400W          |
| \~Price/hour     | $0.15-0.20    |

**Capabilities:**

* ✅ Ollama with 70B models (Q4)
* ✅ vLLM production serving
* ✅ All image generation
* ✅ FLUX dev (high quality)
* ✅ Video generation
* ✅ Model fine-tuning
* ⚠️ 70B FP16 (tight)

***

### NVIDIA A100 80GB

**Best for:** 70B+ models, video, production workloads

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 80GB HBM2e    |
| Memory Bandwidth | 2039 GB/s     |
| FP16 Performance | 77.97 TFLOPS  |
| Tensor Cores     | 432 (3rd gen) |
| TDP              | 400W          |
| \~Price/hour     | $0.20-0.30    |

**Capabilities:**

* ✅ All LLMs up to 70B (FP16)
* ✅ vLLM high-throughput serving
* ✅ All image generation
* ✅ Long video generation
* ✅ Model training
* ✅ DeepSeek-V3 (partial)
* ⚠️ 100B+ models

***

### NVIDIA H100 80GB

**Best for:** Maximum performance, largest models

| Spec             | Value         |
| ---------------- | ------------- |
| VRAM             | 80GB HBM3     |
| Memory Bandwidth | 3350 GB/s     |
| FP16 Performance | 267 TFLOPS    |
| Tensor Cores     | 528 (4th gen) |
| TDP              | 700W          |
| \~Price/hour     | $0.40-0.60    |

**Capabilities:**

* ✅ All models with maximum speed
* ✅ 100B+ parameter models
* ✅ Multi-model serving
* ✅ Large-scale training
* ✅ Real-time video generation
* ✅ DeepSeek-V3 (671B)

## Performance Comparisons

### LLM Inference (tokens/second)

| GPU           | Llama 3 8B | Llama 3 70B | Mixtral 8x7B | Clore.ai $/hr |
| ------------- | ---------- | ----------- | ------------ | ------------- |
| RTX 3060 12GB | 25         | -           | -            | $0.02-0.04    |
| RTX 3090 24GB | 45         | 8\*         | 20\*         | $0.15-0.25    |
| RTX 4090 24GB | 80         | 15\*        | 35\*         | $0.35-0.55    |
| RTX 5080 16GB | 95         | -           | 40\*         | $1.50-2.00    |
| RTX 5090 32GB | 150        | 30\*        | 65\*         | $3.00-4.00    |
| A100 40GB     | 100        | 25          | 45           | $0.80-1.20    |
| A100 80GB     | 110        | 40          | 55           | $1.20-1.80    |
| H100 80GB     | 180        | 70          | 90           | $2.50-3.50    |

\*With quantization (Q4/Q8)

### Image Generation Speed

| GPU           | SD 1.5 (512) | SDXL (1024) | FLUX schnell | Clore.ai $/hr |
| ------------- | ------------ | ----------- | ------------ | ------------- |
| RTX 3060 12GB | 4 sec        | 15 sec      | 25 sec\*     | $0.02-0.04    |
| RTX 3090 24GB | 2 sec        | 7 sec       | 12 sec       | $0.15-0.25    |
| RTX 4090 24GB | 1 sec        | 3 sec       | 5 sec        | $0.35-0.55    |
| RTX 5080 16GB | 0.8 sec      | 2.5 sec     | 4 sec        | $1.50-2.00    |
| RTX 5090 32GB | 0.6 sec      | 1.8 sec     | 3 sec        | $3.00-4.00    |
| A100 40GB     | 1.5 sec      | 4 sec       | 6 sec        | $0.80-1.20    |
| A100 80GB     | 1.5 sec      | 4 sec       | 5 sec        | $1.20-1.80    |

\*With CPU offload, lower resolution

### Video Generation (5 sec clip)

| GPU           | SVD     | Wan2.1  | Hunyuan |
| ------------- | ------- | ------- | ------- |
| RTX 3090 24GB | 3 min   | 5 min\* | -       |
| RTX 4090 24GB | 1.5 min | 3 min   | 8 min\* |
| RTX 5090 32GB | 1 min   | 2 min   | 5 min   |
| A100 40GB     | 1 min   | 2 min   | 5 min   |
| A100 80GB     | 45 sec  | 1.5 min | 3 min   |

\*Limited resolution

## Price/Performance Ratio

### Best Value by Task

**Chat/LLM (7B-13B models):**

1. 🥇 RTX 3090 24GB - Best price/performance
2. 🥈 RTX 3060 12GB - Lowest cost
3. 🥉 RTX 4090 24GB - Fastest

**Image Generation (SDXL/FLUX):**

1. 🥇 RTX 3090 24GB - Great balance
2. 🥈 RTX 4090 24GB - 2x faster
3. 🥉 A100 40GB - Production stability

**Large Models (70B+):**

1. 🥇 A100 40GB - Best value for 70B
2. 🥈 A100 80GB - Full precision
3. 🥉 RTX 4090 24GB - Budget option (Q4 only)

**Video Generation:**

1. 🥇 A100 40GB - Good balance
2. 🥈 RTX 4090 24GB - Consumer option
3. 🥉 A100 80GB - Longest clips

**Model Training:**

1. 🥇 A100 40GB - Standard choice
2. 🥈 A100 80GB - Large models
3. 🥉 RTX 4090 24GB - Small models/LoRA

## Multi-GPU Configurations

Some tasks benefit from multiple GPUs:

| Configuration | Use Case                | VRAM Total |
| ------------- | ----------------------- | ---------- |
| 2x RTX 3090   | 70B inference           | 48GB       |
| 2x RTX 4090   | Fast 70B, training      | 48GB       |
| 2x RTX 5090   | 70B FP16, fast training | 64GB       |
| 4x RTX 5090   | 100B+ models            | 128GB      |
| 4x A100 40GB  | 100B+ models            | 160GB      |
| 8x A100 80GB  | DeepSeek-V3, Llama 405B | 640GB      |

## Choosing Your GPU

### Decision Flowchart

```
What's your main task?
│
├─ Chat/LLM
│  ├─ Model size?
│  │  ├─ ≤7B → RTX 3060 ($0.15–0.30/day)
│  │  ├─ 7B-30B → RTX 3090 ($0.30–1.00/day)
│  │  ├─ 30B-70B → A100 40GB ($1.50–3.00/day)
│  │  └─ 70B+ → A100 80GB ($2.00–4.00/day)
│
├─ Image Generation
│  ├─ Model?
│  │  ├─ SD 1.5 → RTX 3060 ($0.15–0.30/day)
│  │  ├─ SDXL → RTX 3090 ($0.30–1.00/day)
│  │  └─ FLUX → RTX 4090 ($0.50–2.00/day)
│
├─ Video Generation
│  ├─ Length?
│  │  ├─ Short (2-5 sec) → RTX 4090 ($0.50–2.00/day)
│  │  └─ Longer → A100 40GB+ ($1.50–3.00+/day)
│
└─ Training
   ├─ LoRA/small → RTX 4090 ($0.50–2.00/day)
   └─ Full fine-tune → A100 40GB+ ($1.50–3.00+/day)
```

## Tips for Saving Money

1. **Use Spot Orders** - 30-50% cheaper than on-demand
2. **Start Small** - Test on cheaper GPUs first
3. **Quantize Models** - Q4/Q8 fits larger models in less VRAM
4. **Batch Processing** - Process multiple requests at once
5. **Off-peak Hours** - Better availability and sometimes lower prices

> 📚 See also: [Top 10 Cheapest GPUs for AI Training in 2025](https://blog.clore.ai/top-10-cheapest-gpus-for-ai-training/) | [Best GPU for AI Training — Detailed Guide](https://blog.clore.ai/best-gpu-for-ai-training/)

## Next Steps

* [Model Compatibility Matrix](/guides/getting-started/model-compatibility.md) - Which models run on which GPUs
* [Docker Images Catalog](/guides/getting-started/docker-images.md) - Ready-to-use images
* [Quickstart Guide](/guides/quickstart.md) - Get started in 5 minutes


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/getting-started/gpu-comparison.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
