> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-de/sprachmodelle/language-models.md).

# Überblick

Führen Sie große Sprachmodelle (LLMs) auf CLORE.AI-GPUs für Inferenz- und Chat-Anwendungen aus.

## Beliebte Tools

| Werkzeug                                                                             | Anwendungsfall                            | Schwierigkeit |
| ------------------------------------------------------------------------------------ | ----------------------------------------- | ------------- |
| [Ollama](/guides/guides_v2-de/sprachmodelle/ollama.md)                               | Einfachste LLM-Einrichtung                | Anfänger      |
| [Open WebUI](/guides/guides_v2-de/sprachmodelle/open-webui.md)                       | ChatGPT-ähnliche Oberfläche               | Anfänger      |
| [vLLM](/guides/guides_v2-de/sprachmodelle/vllm.md)                                   | Durchsatzstarke Produktionsbereitstellung | Mittel        |
| [Llama.cpp Server](/guides/guides_v2-de/sprachmodelle/llamacpp-server.md)            | Effiziente GGUF-Inferenz                  | Einfach       |
| [Text Generation WebUI](/guides/guides_v2-de/sprachmodelle/text-generation-webui.md) | Voll ausgestattete Chat-Oberfläche        | Einfach       |
| [ExLlamaV2](/guides/guides_v2-de/sprachmodelle/exllamav2-fast.md)                    | Schnellste EXL2-Inferenz                  | Mittel        |
| [LocalAI](/guides/guides_v2-de/sprachmodelle/localai-openai-compatible.md)           | OpenAI-kompatible API                     | Mittel        |
| [SGLang](/guides/guides_v2-de/sprachmodelle/sglang.md)                               | Schnelle strukturierte Generierung        | Mittel        |
| [Text Generation Inference (TGI)](/guides/guides_v2-de/sprachmodelle/tgi.md)         | HuggingFace-Serving-Lösung                | Mittel        |
| [LMDeploy](/guides/guides_v2-de/sprachmodelle/lmdeploy.md)                           | MMlab-Serving-Toolkit                     | Mittel        |
| [Aphrodite Engine](/guides/guides_v2-de/sprachmodelle/aphrodite-engine.md)           | vLLM-Fork mit zusätzlichen Funktionen     | Mittel        |
| [MLC-LLM](/guides/guides_v2-de/sprachmodelle/mlc-llm.md)                             | Maschinelles Lernkompilieren              | Schwierig     |
| [LiteLLM](/guides/guides_v2-de/sprachmodelle/litellm.md)                             | Vereinheitlichter API-Proxy               | Mittel        |
| [PowerInfer](/guides/guides_v2-de/sprachmodelle/powerinfer.md)                       | Sparsame Modellinferenz                   | Schwierig     |
| [Mistral.rs](/guides/guides_v2-de/sprachmodelle/mistral-rs.md)                       | Rust-basierte Inferenz-Engine             | Mittel        |

## Modellanleitungen

### Neueste & beste Modelle

| Modell                                                           | Parameter           | Am besten für                    |
| ---------------------------------------------------------------- | ------------------- | -------------------------------- |
| [DeepSeek-V3](/guides/guides_v2-de/sprachmodelle/deepseek-v3.md) | 671B MoE            | Schlussfolgern, Code, Mathematik |
| [DeepSeek-R1](/guides/guides_v2-de/sprachmodelle/deepseek-r1.md) | 671B MoE            | Fortgeschrittenes Schlussfolgern |
| [DeepSeek V4](/guides/guides_v2-de/sprachmodelle/deepseek-v4.md) | Wird bekanntgegeben | Nächste Generation von DeepSeek  |
| [Qwen2.5](/guides/guides_v2-de/sprachmodelle/qwen25.md)          | 0,5B–72B            | Mehrsprachig, Code               |
| [Qwen3.5](/guides/guides_v2-de/sprachmodelle/qwen35.md)          | Wird bekanntgegeben | Neueste Qwen-Generation          |
| [Llama 3.3](/guides/guides_v2-de/sprachmodelle/llama33.md)       | 70B                 | Metas neuestes 70B               |
| [Llama 4](/guides/guides_v2-de/sprachmodelle/llama4.md)          | Wird bekanntgegeben | Scout- & Maverick-Varianten      |

### Spezialisierte Modelle

| Modell                                                                 | Parameter           | Am besten für               |
| ---------------------------------------------------------------------- | ------------------- | --------------------------- |
| [DeepSeek Coder](/guides/guides_v2-de/sprachmodelle/deepseek-coder.md) | 6,7B–33B            | Code-Generierung            |
| [CodeLlama](/guides/guides_v2-de/sprachmodelle/codellama.md)           | 7B–34B              | Codevervollständigung       |
| [GLM-4.7-Flash](/guides/guides_v2-de/sprachmodelle/glm-47-flash.md)    | 4,7B                | Schnell Chinesisch/Englisch |
| [GLM-5](/guides/guides_v2-de/sprachmodelle/glm5.md)                    | Wird bekanntgegeben | Zhipu AI neuestes           |
| [Kimi K2.5](/guides/guides_v2-de/sprachmodelle/kimi-k2.md)             | Wird bekanntgegeben | Moonshot AI-Modell          |
| [Ling-2.5-1T](/guides/guides_v2-de/sprachmodelle/ling25.md)            | 1T                  | Massives Open-Source-LLM    |
| [LFM2-24B](/guides/guides_v2-de/sprachmodelle/lfm2-24b.md)             | 24B                 | Liquid-AI-Modell            |
| [MiMo-V2-Flash](/guides/guides_v2-de/sprachmodelle/mimo-v2-flash.md)   | Wird bekanntgegeben | Schnelles Inferenzmodell    |

### Effiziente Modelle

| Modell                                                                   | Parameter           | Am besten für                     |
| ------------------------------------------------------------------------ | ------------------- | --------------------------------- |
| [Gemma 2](/guides/guides_v2-de/sprachmodelle/gemma2.md)                  | 2B–27B              | Effiziente Inferenz               |
| [Gemma 3](/guides/guides_v2-de/sprachmodelle/gemma3.md)                  | Wird bekanntgegeben | Googles neuestes kompaktes Modell |
| [Phi-4](/guides/guides_v2-de/sprachmodelle/phi4.md)                      | 14B                 | Klein, aber leistungsfähig        |
| [Mistral/Mixtral](/guides/guides_v2-de/sprachmodelle/mistral-mixtral.md) | 7B / 8x7B           | Allzweck                          |
| [Mistral Large 3](/guides/guides_v2-de/sprachmodelle/mistral-large3.md)  | 675B MoE            | Unternehmensklasse                |
| [Mistral Small 3.1](/guides/guides_v2-de/sprachmodelle/mistral-small.md) | Wird bekanntgegeben | Effiziente Mistral-Variante       |

## GPU-Empfehlungen

| Modellgröße | Mindest-GPU   | Empfohlen |
| ----------- | ------------- | --------- |
| 7B (Q4)     | RTX 3060 12GB | RTX 3090  |
| 13B (Q4)    | RTX 3090 24GB | RTX 4090  |
| 34B (Q4)    | 2x RTX 3090   | A100 40GB |
| 70B (Q4)    | A100 80GB     | 2x A100   |

## Quantisierungsanleitung

| Format   | VRAM-Nutzung   | Qualität      | Geschwindigkeit |
| -------- | -------------- | ------------- | --------------- |
| Q2\_K    | Am niedrigsten | Schlecht      | Am schnellsten  |
| Q4\_K\_M | Niedrig        | Gut           | Schnell         |
| Q5\_K\_M | Mittel         | Großartig     | Mittel          |
| Q8\_0    | Hoch           | Ausgezeichnet | Langsamer       |
| FP16     | Am höchsten    | Am besten     | Am langsamsten  |

## Siehe auch

* [Training & Feinabstimmung](/guides/guides_v2-de/training/training.md)
* [Vision-Sprach-Modelle](/guides/guides_v2-de/vision-modelle/vision-models.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-de/sprachmodelle/language-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.