GPU-Vergleich

Vollständiger GPU‑Vergleichsleitfaden für KI‑Workloads auf Clore.ai

Vollständiger Vergleich der auf CLORE.AI verfügbaren GPUs für KI-Workloads.

Finde die richtige GPU für deine Aufgabe auf CLORE.AI Marktplatz.

Schnelle Empfehlung

Deine Aufgabe

Budget-Wahl

Bestes Preis-Leistungs-Verhältnis

Maximale Leistung

Chat mit KI (7B)

RTX 3060 12GB

RTX 3090 24GB

RTX 5090 32GB

Chat mit KI (70B)

RTX 3090 24GB

RTX 5090 32GB

A100 80GB

Bildgenerierung (SD 1.5)

RTX 3060 12GB

RTX 3090 24GB

RTX 5090 32GB

Bildgenerierung (SDXL)

RTX 3090 24GB

RTX 4090 24GB

RTX 5090 32GB

Bildgenerierung (FLUX)

RTX 3090 24GB

RTX 5090 32GB

A100 80GB

Videogenerierung

RTX 4090 24GB

RTX 5090 32GB

A100 80GB

Modelltraining

A100 40GB

A100 80GB

H100 80GB

Consumer-GPUs

NVIDIA RTX 3060 12GB

Am besten für: Budget-KI, SD 1.5, kleine LLMs

Spezifikation

Wert

VRAM

12GB GDDR6

Speicherbandbreite

360 GB/s

FP16-Leistung

12,7 TFLOPS

Tensor Cores

112 (3. Generation)

TDP

170W

~Preis/Stunde

$0.02-0.04

Fähigkeiten:

✅ Ollama mit 7B-Modellen (Q4)
✅ Stable Diffusion 1.5 (512x512)
✅ SDXL (768x768, langsam)
⚠️ FLUX schnell (mit CPU-Offload)
❌ Große Modelle (>13B)
❌ Videogenerierung

NVIDIA RTX 3070/3070 Ti 8GB

Am besten für: SD 1.5, leichte Aufgaben

Spezifikation

Wert

VRAM

8GB GDDR6X

Speicherbandbreite

448–608 GB/s

FP16-Leistung

20,3 TFLOPS

Tensor Cores

184 (3. Generation)

TDP

220–290W

~Preis/Stunde

$0.02-0.04

Fähigkeiten:

✅ Ollama mit 7B-Modellen (Q4)
✅ Stable Diffusion 1.5 (512x512)
⚠️ SDXL (nur niedrige Auflösung)
❌ FLUX (unzureichender VRAM)
❌ Modelle >7B
❌ Videogenerierung

NVIDIA RTX 3080/3080 Ti 10-12GB

Am besten für: Allgemeine KI-Aufgaben, gutes Gleichgewicht

Spezifikation

Wert

VRAM

10–12GB GDDR6X

Speicherbandbreite

760–912 GB/s

FP16-Leistung

29,8–34,1 TFLOPS

Tensor Cores

272–320 (3. Generation)

TDP

320–350W

~Preis/Stunde

$0.04-0.06

Fähigkeiten:

✅ Ollama mit 13B-Modellen
✅ Stable Diffusion 1.5/2.1
✅ SDXL (1024x1024)
⚠️ FLUX schnell (mit Offload)
❌ Große Modelle (>13B)
❌ Videogenerierung

NVIDIA RTX 3090/3090 Ti 24GB

Am besten für: SDXL, 13B–30B LLMs, ControlNet

Spezifikation

Wert

VRAM

24GB GDDR6X

Speicherbandbreite

936 GB/s

FP16-Leistung

35,6 TFLOPS

Tensor Cores

328 (3. Generation)

TDP

350–450W

~Preis/Stunde

$0.05-0.08

Fähigkeiten:

✅ Ollama mit 30B-Modellen
✅ vLLM mit 13B-Modellen
✅ Alle Stable Diffusion-Modelle
✅ SDXL + ControlNet
✅ FLUX schnell (1024x1024)
⚠️ FLUX dev (mit Offload)
⚠️ Video (kurze Clips)

NVIDIA RTX 4070 Ti 12GB

Am besten für: Schnelles SD 1.5, effiziente Inferenz

Spezifikation

Wert

VRAM

12GB GDDR6X

Speicherbandbreite

504 GB/s

FP16-Leistung

40,1 TFLOPS

Tensor Cores

184 (4. Generation)

TDP

285W

~Preis/Stunde

$0.04-0.06

Fähigkeiten:

✅ Ollama mit 7B-Modellen (schnell)
✅ Stable Diffusion 1.5 (sehr schnell)
✅ SDXL (768x768)
⚠️ FLUX schnell (begrenzte Auflösung)
❌ Große Modelle (>13B)
❌ Videogenerierung

NVIDIA RTX 4080 16GB

Am besten für: SDXL-Produktion, 13B-LLMs

Spezifikation

Wert

VRAM

16GB GDDR6X

Speicherbandbreite

717 GB/s

FP16-Leistung

48,7 TFLOPS

Tensor Cores

304 (4. Generation)

TDP

320W

~Preis/Stunde

$0.06-0.09

Fähigkeiten:

✅ Ollama mit 13B-Modellen (schnell)
✅ vLLM mit 7B-Modellen
✅ Alle Stable Diffusion-Modelle
✅ SDXL + ControlNet
✅ FLUX schnell (1024x1024)
⚠️ FLUX dev (begrenzt)
⚠️ Kurze Videoclips

NVIDIA RTX 4090 24GB

Am besten für: High-End-Consumer-Leistung, FLUX, Video

Spezifikation

Wert

VRAM

24GB GDDR6X

Speicherbandbreite

1008 GB/s

FP16-Leistung

82,6 TFLOPS

Tensor Cores

512 (4. Generation)

TDP

450W

~Preis/Stunde

$0.08-0.12

Fähigkeiten:

✅ Ollama mit 30B-Modellen (schnell)
✅ vLLM mit 13B-Modellen
✅ Alle Bildgenerierungsmodelle
✅ FLUX dev (1024x1024)
✅ Videogenerierung (kurz)
✅ AnimateDiff
⚠️ 70B-Modelle (nur Q4)

NVIDIA RTX 5080 16GB (Neu — Feb 2025)

Am besten für: Schnelles SDXL/FLUX, 13B–30B LLMs, leistungsstarke Mittelklasse

Spezifikation

Wert

VRAM

16GB GDDR7

Speicherbandbreite

960 GB/s

FP16-Leistung

~80 TFLOPS

Tensor Cores

336 (5. Generation)

TDP

360W

~Clore.ai Preis/Stunde

$1.50-2.00

Fähigkeiten:

✅ Ollama mit 13B-Modellen (schnell)
✅ vLLM mit 13B-Modellen
✅ Alle Stable Diffusion-Modelle
✅ SDXL + ControlNet (sehr schnell)
✅ FLUX schnell/dev (1024x1024)
✅ Kurze Videoclips
⚠️ 30B-Modelle (nur Q4)
❌ 70B-Modelle

NVIDIA RTX 5090 32GB (Flaggschiff — Feb 2025)

Am besten für: Maximale Consumer-Leistung, 70B-Modelle, hochaufgelöste Videogenerierung

Spezifikation

Wert

VRAM

32GB GDDR7

Speicherbandbreite

1792 GB/s

FP16-Leistung

~120 TFLOPS

Tensor Cores

680 (5. Generation)

TDP

575W

~Clore.ai Preis/Stunde

$3.00-4.00

Fähigkeiten:

✅ Ollama mit 70B-Modellen (Q4, schnell)
✅ vLLM mit 30B-Modellen
✅ Alle Bildgenerierungsmodelle
✅ FLUX dev (1536x1536)
✅ Videogenerierung (längere Clips)
✅ AnimateDiff + ControlNet
✅ Modelltraining (LoRA, kleine Feinabstimmungen)
✅ DeepSeek-R1 32B Distill (FP16)

Professionelle/Datacenter-GPUs

NVIDIA A100 40GB

Am besten für: Produktions-LLMs, Training, große Modelle

Spezifikation

Wert

VRAM

40GB HBM2e

Speicherbandbreite

1555 GB/s

FP16-Leistung

77,97 TFLOPS

Tensor Cores

432 (3. Generation)

TDP

400W

~Preis/Stunde

$0.15-0.20

Fähigkeiten:

✅ Ollama mit 70B-Modellen (Q4)
✅ vLLM Produktions-Serving
✅ Alle Bildgenerierungen
✅ FLUX dev (hohe Qualität)
✅ Videogenerierung
✅ Modell-Finetuning
⚠️ 70B FP16 (eng)

NVIDIA A100 80GB

Am besten für: 70B+-Modelle, Video, Produktions-Workloads

Spezifikation

Wert

VRAM

80GB HBM2e

Speicherbandbreite

2039 GB/s

FP16-Leistung

77,97 TFLOPS

Tensor Cores

432 (3. Generation)

TDP

400W

~Preis/Stunde

$0.20-0.30

Fähigkeiten:

✅ Alle LLMs bis 70B (FP16)
✅ vLLM Hochdurchsatz-Serving
✅ Alle Bildgenerierungen
✅ Lange Videogenerierung
✅ Modelltraining
✅ DeepSeek-V3 (teilweise)
⚠️ 100B+-Modelle

NVIDIA H100 80GB

Am besten für: Maximale Leistung, größte Modelle

Spezifikation

Wert

VRAM

80GB HBM3

Speicherbandbreite

3350 GB/s

FP16-Leistung

267 TFLOPS

Tensor Cores

528 (4. Generation)

TDP

700W

~Preis/Stunde

$0.40-0.60

Fähigkeiten:

✅ Alle Modelle mit maximaler Geschwindigkeit
✅ 100B+-Parameter-Modelle
✅ Multi-Model-Serving
✅ Großskaliges Training
✅ Echtzeit-Videogenerierung
✅ DeepSeek-V3 (671B)

Leistungsvergleiche

LLM-Inferenz (Token/Sekunde)

GPU

Llama 3 8B

Llama 3 70B

Mixtral 8x7B

Clore.ai $/Std.

RTX 3060 12GB

$0.02-0.04

RTX 3090 24GB

20*

$0.15-0.25

RTX 4090 24GB

15*

35*

$0.35-0.55

RTX 5080 16GB

40*

$1.50-2.00

RTX 5090 32GB

150

30*

65*

$3.00-4.00

A100 40GB

100

$0.80-1.20

A100 80GB

110

$1.20-1.80

H100 80GB

180

$2.50-3.50

*Mit Quantisierung (Q4/Q8)

Geschwindigkeit der Bildgenerierung

GPU

SD 1.5 (512)

SDXL (1024)

FLUX schnell

Clore.ai $/Std.

RTX 3060 12GB

4 Sek.

15 Sek.

25 Sek.*

$0.02-0.04

RTX 3090 24GB

2 Sek.

7 Sek.

12 Sek.

$0.15-0.25

RTX 4090 24GB

1 Sek.

3 Sek.

5 Sek.

$0.35-0.55

RTX 5080 16GB

0,8 Sek.

2,5 Sek.

4 Sek.

$1.50-2.00

RTX 5090 32GB

0,6 Sek.

1,8 Sek.

3 Sek.

$3.00-4.00

A100 40GB

1,5 Sek.

4 Sek.

6 Sek.

$0.80-1.20

A100 80GB

1,5 Sek.

4 Sek.

5 Sek.

$1.20-1.80

*Mit CPU-Offload, geringere Auflösung

Videogenerierung (5 Sek. Clip)

GPU

SVD

Wan2.1

Hunyuan

RTX 3090 24GB

3 Min.

5 Min.*

RTX 4090 24GB

1,5 Min.

3 Min.

8 Min.*

RTX 5090 32GB

1 Min.

2 Min.

5 Min.

A100 40GB

1 Min.

2 Min.

5 Min.

A100 80GB

45 Sek.

1,5 Min.

3 Min.

*Begrenzte Auflösung

Preis-Leistungs-Verhältnis

Bestes Preis-Leistungs-Verhältnis nach Aufgabe

Chat/LLM (7B–13B Modelle):

🥇 RTX 3090 24GB - Bestes Preis/Leistungsverhältnis
🥈 RTX 3060 12GB - Niedrigste Kosten
🥉 RTX 4090 24GB - Schnellste

Bildgenerierung (SDXL/FLUX):

🥇 RTX 3090 24GB - Großartiges Gleichgewicht
🥈 RTX 4090 24GB - 2x schneller
🥉 A100 40GB - Produktionsstabilität

Große Modelle (70B+):

🥇 A100 40GB - Bestes Preis-Leistungs-Verhältnis für 70B
🥈 A100 80GB - Volle Präzision
🥉 RTX 4090 24GB - Budget-Option (nur Q4)

Videogenerierung:

🥇 A100 40GB - Gutes Gleichgewicht
🥈 RTX 4090 24GB - Consumer-Option
🥉 A100 80GB - Längste Clips

Modelltraining:

🥇 A100 40GB - Standardwahl
🥈 A100 80GB - Große Modelle
🥉 RTX 4090 24GB - Kleine Modelle/LoRA

Multi-GPU-Konfigurationen

Einige Aufgaben profitieren von mehreren GPUs:

Konfiguration

Anwendungsfall

Gesamter VRAM

2x RTX 3090

70B-Inferenz

48GB

2x RTX 4090

Schnelles 70B, Training

48GB

2x RTX 5090

70B FP16, schnelles Training

64GB

4x RTX 5090

100B+ Modelle

128GB

4x A100 40GB

100B+ Modelle

160GB

8x A100 80GB

DeepSeek-V3, Llama 405B

640GB

Wähle deine GPU

Entscheidungsflussdiagramm

Was ist deine Hauptaufgabe?
│
├─ Chat/LLM
│  ├─ Modellgröße?
│  │  ├─ ≤7B → RTX 3060 (0,15–0,30 $/Tag)
│  │  ├─ 7B–30B → RTX 3090 (0,30–1,00 $/Tag)
│  │  ├─ 30B–70B → A100 40GB (1,50–3,00 $/Tag)
│  │  └─ 70B+ → A100 80GB (2,00–4,00 $/Tag)
│
├─ Bildgenerierung
│  ├─ Modell?
│  │  ├─ SD 1.5 → RTX 3060 (0,15–0,30 $/Tag)
│  │  ├─ SDXL → RTX 3090 (0,30–1,00 $/Tag)
│  │  └─ FLUX → RTX 4090 (0,50–2,00 $/Tag)
│
├─ Videogenerierung
│  ├─ Länge?
│  │  ├─ Kurz (2–5 Sek.) → RTX 4090 (0,50–2,00 $/Tag)
│  │  └─ Länger → A100 40GB+ (1,50–3,00+ $/Tag)
│
└─ Training
   ├─ LoRA/klein → RTX 4090 (0,50–2,00 $/Tag)
   └─ Vollständiges Fine-Tune → A100 40GB+ (1,50–3,00+ $/Tag)

Tipps zum Geldsparen

Verwende Spot-Bestellungen - 30–50% günstiger als On-Demand
Klein anfangen - Zuerst auf günstigeren GPUs testen
Modelle quantisieren - Q4/Q8 bringt größere Modelle in weniger VRAM unter
Batch-Verarbeitung - Mehrere Anfragen gleichzeitig verarbeiten
Nebenzeiten nutzen - Bessere Verfügbarkeit und manchmal niedrigere Preise

📚 Siehe auch: Top 10 der günstigsten GPUs für AI-Training im Jahr 2025 | Beste GPU für AI-Training — Detaillierter Leitfaden

Nächste Schritte

Kompatibilitätsmatrix für Modelle - Welche Modelle auf welchen GPUs laufen
Katalog von Docker-Images - Einsatzfertige Images
Quickstart-Anleitung - Starte in 5 Minuten

VorherigeFAQ NächsteModellkompatibilität

Zuletzt aktualisiert vor 9 Tagen

War das hilfreich?

hashtagSchnelle Empfehlung

hashtagConsumer-GPUs

hashtagNVIDIA RTX 3060 12GB

hashtagNVIDIA RTX 3070/3070 Ti 8GB

hashtagNVIDIA RTX 3080/3080 Ti 10-12GB

hashtagNVIDIA RTX 3090/3090 Ti 24GB

hashtagNVIDIA RTX 4070 Ti 12GB

hashtagNVIDIA RTX 4080 16GB

hashtagNVIDIA RTX 4090 24GB

hashtagNVIDIA RTX 5080 16GB (Neu — Feb 2025)

hashtagNVIDIA RTX 5090 32GB (Flaggschiff — Feb 2025)

hashtagProfessionelle/Datacenter-GPUs

hashtagNVIDIA A100 40GB

hashtagNVIDIA A100 80GB

hashtagNVIDIA H100 80GB

hashtagLeistungsvergleiche

hashtagLLM-Inferenz (Token/Sekunde)

hashtagGeschwindigkeit der Bildgenerierung

hashtagVideogenerierung (5 Sek. Clip)

hashtagPreis-Leistungs-Verhältnis

hashtagBestes Preis-Leistungs-Verhältnis nach Aufgabe

hashtagMulti-GPU-Konfigurationen

hashtagWähle deine GPU

hashtagEntscheidungsflussdiagramm

hashtagTipps zum Geldsparen

hashtagNächste Schritte

Schnelle Empfehlung

Consumer-GPUs

NVIDIA RTX 3060 12GB

NVIDIA RTX 3070/3070 Ti 8GB

NVIDIA RTX 3080/3080 Ti 10-12GB

NVIDIA RTX 3090/3090 Ti 24GB

NVIDIA RTX 4070 Ti 12GB

NVIDIA RTX 4080 16GB

NVIDIA RTX 4090 24GB

NVIDIA RTX 5080 16GB (Neu — Feb 2025)

NVIDIA RTX 5090 32GB (Flaggschiff — Feb 2025)

Professionelle/Datacenter-GPUs

NVIDIA A100 40GB

NVIDIA A100 80GB

NVIDIA H100 80GB

Leistungsvergleiche

LLM-Inferenz (Token/Sekunde)

Geschwindigkeit der Bildgenerierung

Videogenerierung (5 Sek. Clip)

Preis-Leistungs-Verhältnis

Bestes Preis-Leistungs-Verhältnis nach Aufgabe

Multi-GPU-Konfigurationen

Wähle deine GPU

Entscheidungsflussdiagramm

Tipps zum Geldsparen

Nächste Schritte