# Language Models

- [Overview](https://docs.clore.ai/guides/language-models/language-models.md)
- [Ollama](https://docs.clore.ai/guides/language-models/ollama.md): Run LLMs locally with Ollama on Clore.ai GPUs
- [Open WebUI](https://docs.clore.ai/guides/language-models/open-webui.md): ChatGPT-like interface for running LLMs on Clore.ai GPUs
- [vLLM](https://docs.clore.ai/guides/language-models/vllm.md): High-throughput LLM inference with vLLM on Clore.ai GPUs
- [Llama.cpp Server](https://docs.clore.ai/guides/language-models/llamacpp-server.md): Efficient LLM inference with llama.cpp server on Clore.ai GPUs
- [Text Generation WebUI](https://docs.clore.ai/guides/language-models/text-generation-webui.md): Run text-generation-webui for LLM inference on Clore.ai GPUs
- [ExLlamaV2](https://docs.clore.ai/guides/language-models/exllamav2-fast.md): Maximum speed LLM inference with ExLlamaV2 on Clore.ai GPUs
- [LocalAI](https://docs.clore.ai/guides/language-models/localai-openai-compatible.md): Self-hosted OpenAI-compatible API with LocalAI on Clore.ai
- [Llama 3.3 70B](https://docs.clore.ai/guides/language-models/llama33.md): Run Meta's Llama 3.3 70B model on Clore.ai GPUs
- [Mistral & Mixtral](https://docs.clore.ai/guides/language-models/mistral-mixtral.md): Run Mistral and Mixtral models on Clore.ai GPUs
- [DeepSeek Coder](https://docs.clore.ai/guides/language-models/deepseek-coder.md): Best-in-class code generation with DeepSeek Coder on Clore.ai
- [DeepSeek-V3](https://docs.clore.ai/guides/language-models/deepseek-v3.md): Run DeepSeek-V3 with exceptional reasoning on Clore.ai GPUs
- [DeepSeek-R1 Reasoning Model](https://docs.clore.ai/guides/language-models/deepseek-r1.md): Run DeepSeek-R1 open-source reasoning model on Clore.ai GPUs
- [Qwen2.5](https://docs.clore.ai/guides/language-models/qwen25.md): Run Alibaba's Qwen2.5 multilingual LLMs on Clore.ai GPUs
- [CodeLlama](https://docs.clore.ai/guides/language-models/codellama.md): Generate, complete, and explain code with CodeLlama on Clore.ai
- [Gemma 2](https://docs.clore.ai/guides/language-models/gemma2.md): Run Google's Gemma 2 models efficiently on Clore.ai GPUs
- [Phi-4](https://docs.clore.ai/guides/language-models/phi4.md): Run Microsoft's Phi-4 small language model on Clore.ai GPUs
- [Llama 4 (Scout & Maverick)](https://docs.clore.ai/guides/language-models/llama4.md): Run Meta Llama 4 Scout & Maverick MoE models on Clore.ai GPUs
- [Gemma 3](https://docs.clore.ai/guides/language-models/gemma3.md): Run Google Gemma 3 multimodal models on Clore.ai — beats Llama-405B at 15x smaller
- [Gemma 4 (26B MoE, 4B active)](https://docs.clore.ai/guides/language-models/gemma4.md): Deploy Gemma 4 (26B MoE, 4B active) by Google on Clore.ai — the open-weight model released April 2026 that climbed to
- [Mistral Small 3.1](https://docs.clore.ai/guides/language-models/mistral-small.md): Deploy Mistral Small 3.1 (24B) on Clore.ai — the ideal single-GPU production model
- [Qwen3.5](https://docs.clore.ai/guides/language-models/qwen35.md): Run Alibaba Qwen3.5 on Clore.ai — the freshest frontier model (Feb 2026)
- [Qwen3.5-Omni (Multimodal)](https://docs.clore.ai/guides/language-models/qwen35-omni.md)
- [GLM-5](https://docs.clore.ai/guides/language-models/glm5.md): Deploy GLM-5 (744B MoE) by Zhipu AI on Clore.ai — API access and self-hosting with vLLM
- [GLM-4.7-Flash](https://docs.clore.ai/guides/language-models/glm-47-flash.md): Deploy GLM-4.7-Flash (30B MoE) by Zhipu AI on Clore.ai — efficient language model with 59.2% SWE-bench performance
- [Kimi K2.5](https://docs.clore.ai/guides/language-models/kimi-k2.md): Deploy Kimi K2.5 (1T MoE multimodal) by Moonshot AI on Clore.ai GPUs
- [Mistral Large 3 (675B MoE)](https://docs.clore.ai/guides/language-models/mistral-large3.md): Run Mistral Large 3 — a 675B MoE frontier model with 41B active parameters on Clore.ai GPUs
- [Mistral Medium 3.5 (128B Dense, 256K)](https://docs.clore.ai/guides/language-models/mistral-medium35.md): Deploy Mistral Medium 3.5 on Clore.ai — 128B dense, 256K context, dual-mode reasoning released April 2026. Production vLLM/SGLang setup on 4× H100 or 2× H200.
- [MiMo-V2-Flash](https://docs.clore.ai/guides/language-models/mimo-v2-flash.md): Deploy MiMo-V2-Flash (309B MoE) with speculative decoding on Clore.ai — ultra-fast inference with 150+ tok/s
- [Ling-2.5-1T (1 Trillion Parameters)](https://docs.clore.ai/guides/language-models/ling25.md): Run Ling-2.5-1T — Ant Group's 1 trillion parameter open-source LLM with hybrid linear attention on Clore.ai GPUs
- [LFM2-24B-A2B](https://docs.clore.ai/guides/language-models/lfm2-24b.md): Deploy LFM2-24B-A2B by Liquid AI on Clore.ai — hybrid SSM+Attention architecture with 24B total / 2B active parameters
- [DeepSeek V4 (1.6T MoE, Multimodal)](https://docs.clore.ai/guides/language-models/deepseek-v4.md): Deploy DeepSeek V4 (1.6T-param Pro and 284B Flash) on Clore.ai — the open-weight frontier MoE released April 22, 2026
- [GLM-5.1 (744B MoE, #1 SWE-Bench Pro)](https://docs.clore.ai/guides/language-models/glm-5-1.md): Deploy GLM-5.1 (744B MoE, 40B active) by Z.ai on Clore.ai — the open-weight model that topped SWE-Bench Pro in April 2026
- [NVIDIA Nemotron 3 Super (120B MoE)](https://docs.clore.ai/guides/language-models/nvidia-nemotron-3-super.md)
- [Gemini 3.1 Flash Lite](https://docs.clore.ai/guides/language-models/gemini-3-1-flash-lite.md)
- [Hy3 Preview (Tencent Hunyuan 3, 295B MoE)](https://docs.clore.ai/guides/language-models/hy3-preview.md): Deploy Tencent's Hy3 Preview (295B MoE, 21B active, 256K ctx) on Clore.ai — the first model from Tencent Hunyuan's rebuilt training stack, tuned for long-horizon reasoning and agentic coding
- [MiMo-V2.5-Pro (Xiaomi 1T MoE)](https://docs.clore.ai/guides/language-models/mimo-v25-pro.md): Deploy MiMo-V2.5-Pro (1.02T MoE, 42B active, 1M context) by Xiaomi on Clore.ai — the first open-weight Pro tier from the MiMo team, FP8 native, hybrid attention
- [MiniMax M2.7 (229B MoE Coding)](https://docs.clore.ai/guides/language-models/minimax-m27.md): Deploy MiniMax M2.7 (229B MoE) on Clore.ai — the open-weight self-hosted release behind MiniMax's coding agent push, with FP8 single-node deployment on H100/H200
- [Ling-2.6-flash (Ant Group 104B MoE)](https://docs.clore.ai/guides/language-models/ling-26-flash.md): Deploy Ling-2.6-flash (104B MoE, 7.4B active) by Ant Group on Clore.ai — the agent-tuned flash sibling that fits on a single RTX 4090
- [Qwen3.6-27B (Dense, Single-GPU)](https://docs.clore.ai/guides/language-models/qwen36-27b.md): Deploy Qwen3.6-27B by Alibaba on Clore.ai — a dense 27B that fits on one RTX 4090 and ships with 262K native context
- [TGI (Text Generation Inference)](https://docs.clore.ai/guides/language-models/tgi.md): Run HuggingFace Text Generation Inference (TGI) for production LLM serving on Clore.ai GPUs
- [SGLang](https://docs.clore.ai/guides/language-models/sglang.md): Deploy SGLang for high-performance LLM serving with RadixAttention on Clore.ai GPUs
- [Aphrodite Engine](https://docs.clore.ai/guides/language-models/aphrodite-engine.md): Run Aphrodite Engine for LLM inference on legacy and modern GPUs on Clore.ai
- [LiteLLM AI Gateway](https://docs.clore.ai/guides/language-models/litellm.md): Deploy LiteLLM as an AI Gateway proxy for 100+ LLMs on Clore.ai GPUs
- [MLC-LLM](https://docs.clore.ai/guides/language-models/mlc-llm.md)
- [PowerInfer](https://docs.clore.ai/guides/language-models/powerinfer.md)
- [LMDeploy](https://docs.clore.ai/guides/language-models/lmdeploy.md)
- [Mistral.rs](https://docs.clore.ai/guides/language-models/mistral-rs.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/language-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
