> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing.md).

# 语言模型

- [概览](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/language-models.md)
- [Ollama](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/ollama.md): 在 Clore.ai 的 GPU 上使用 Ollama 本地运行 LLM
- [Open WebUI](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/open-webui.md): 用于在 Clore.ai GPU 上运行 LLM 的类 ChatGPT 界面
- [vLLM](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/vllm.md): 在 Clore.ai 的 GPU 上使用 vLLM 实现高吞吐量 LLM 推理
- [Llama.cpp 服务器](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/llamacpp-server.md): 在 Clore.ai 的 GPU 上使用 llama.cpp server 实现高效 LLM 推理
- [Text Generation WebUI](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/text-generation-webui.md): 在 Clore.ai 的 GPU 上运行 text-generation-webui 进行 LLM 推理
- [ExLlamaV2](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/exllamav2-fast.md): 在 Clore.ai 的 GPU 上使用 ExLlamaV2 实现最高速度的 LLM 推理
- [LocalAI](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/localai-openai-compatible.md): 在 Clore.ai 上使用 LocalAI 自托管兼容 OpenAI 的 API
- [Llama 3.3 70B](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/llama33.md): 在 Clore.ai 的 GPU 上运行 Meta 的 Llama 3.3 70B 模型
- [Mistral 与 Mixtral](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/mistral-mixtral.md): 在 Clore.ai 的 GPU 上运行 Mistral 和 Mixtral 模型
- [DeepSeek Coder](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/deepseek-coder.md): 在 Clore.ai 上使用 DeepSeek Coder 实现顶级代码生成
- [DeepSeek-V3](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/deepseek-v3.md): 在 Clore.ai 的 GPU 上运行 DeepSeek-V3，获得卓越推理能力
- [DeepSeek-R1 推理模型](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/deepseek-r1.md): 在 Clore.ai 的 GPU 上运行 DeepSeek-R1 开源推理模型
- [Qwen2.5](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/qwen25.md): 在 Clore.ai 的 GPU 上运行阿里巴巴的 Qwen2.5 多语言 LLM
- [CodeLlama](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/codellama.md): 在 Clore.ai 上使用 CodeLlama 生成、补全和解释代码
- [Gemma 2](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/gemma2.md): 在 Clore.ai 的 GPU 上高效运行 Google 的 Gemma 2 模型
- [Phi-4](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/phi4.md): 在 Clore.ai 的 GPU 上运行微软的 Phi-4 小型语言模型
- [Llama 4（Scout 与 Maverick）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/llama4.md): 在 Clore.ai 的 GPU 上运行 Meta 的 Llama 4 Scout 与 Maverick MoE 模型
- [Gemma 3](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/gemma3.md): 在 Clore.ai 上运行 Google Gemma 3 多模态模型——以 15 倍更小的体积超越 Llama-405B
- [Gemma 4（26B MoE，4B 激活）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/gemma4.md): 在 Clore.ai 上部署 Google 的 Gemma 4（26B MoE，4B 激活）——于 2026 年 4 月发布的开源权重模型，攀升至
- [Mistral Small 3.1](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/mistral-small.md): 在 Clore.ai 上部署 Mistral Small 3.1（24B）——理想的单 GPU 生产模型
- [Qwen3.5](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/qwen35.md): 在 Clore.ai 上运行阿里巴巴 Qwen3.5——最新的前沿模型（2026 年 2 月）
- [Qwen3.5-Omni（多模态）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/qwen35-omni.md)
- [GLM-5](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/glm5.md): 在 Clore.ai 上部署 Zhipu AI 的 GLM-5（744B MoE）——通过 vLLM 提供 API 访问和自托管
- [GLM-4.7-Flash](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/glm-47-flash.md): 在 Clore.ai 上部署 Zhipu AI 的 GLM-4.7-Flash（30B MoE）——高效语言模型，SWE-bench 表现达 59.2%
- [Kimi K2.5](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/kimi-k2.md): 在 Clore.ai 的 GPU 上部署 Moonshot AI 的 Kimi K2.5（1T MoE 多模态）
- [Mistral Large 3（675B MoE）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/mistral-large3.md): 在 Clore.ai 的 GPU 上运行 Mistral Large 3——拥有 41B 激活参数的 675B MoE 前沿模型
- [Mistral Medium 3.5（128B Dense，256K）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/mistral-medium35.md): 在 Clore.ai 上部署 Mistral Medium 3.5——128B 稠密、256K 上下文、双模式推理，于 2026 年 4 月发布。可在 4× H100 或 2× H200 上进行生产级 vLLM/SGLang 部署。
- [MiMo-V2-Flash](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/mimo-v2-flash.md): 在 Clore.ai 上部署 MiMo-V2-Flash（309B MoE）并使用推测解码——超高速推理，达到 150+ tok/s
- [Ling-2.5-1T（1 万亿参数）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/ling25.md): 在 Clore.ai 的 GPU 上运行 Ling-2.5-1T——蚂蚁集团的 1 万亿参数开源 LLM，采用混合线性注意力
- [LFM2-24B-A2B](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/lfm2-24b.md): 在 Clore.ai 上部署 Liquid AI 的 LFM2-24B-A2B——混合 SSM+Attention 架构，总参数 24B / 激活参数 2B
- [DeepSeek V4（1.6T MoE，多模态）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/deepseek-v4.md): 在 Clore.ai 上部署 DeepSeek V4（1.6T 参数 Pro 和 284B Flash）——于 2026 年 4 月 22 日发布的开源前沿 MoE
- [GLM-5.1（744B MoE，SWE-Bench Pro 第 1 名）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/glm-5-1.md): 在 Clore.ai 上部署 Z.ai 的 GLM-5.1（744B MoE，40B 激活）——于 2026 年 4 月在 SWE-Bench Pro 上登顶的开源权重模型
- [NVIDIA Nemotron 3 Super（120B MoE）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/nvidia-nemotron-3-super.md)
- [Gemini 3.1 Flash Lite](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/gemini-3-1-flash-lite.md)
- [Hy3 Preview（腾讯混元 3，295B MoE）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/hy3-preview.md): 在 Clore.ai 上部署腾讯的 Hy3 Preview（295B MoE，21B 激活，256K 上下文）——腾讯混元重建训练栈推出的首个模型，针对长程推理和代理式编程进行了优化
- [MiMo-V2.5-Pro（小米 1T MoE）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/mimo-v25-pro.md): 在 Clore.ai 上部署小米的 MiMo-V2.5-Pro（1.02T MoE，42B 激活，100 万上下文）——MiMo 团队首个开源权重 Pro 级模型，原生 FP8，混合注意力
- [MiniMax M2.7（229B MoE 编程）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/minimax-m27.md): 在 Clore.ai 上部署 MiniMax M2.7（229B MoE）——支撑 MiniMax 编程智能体推进的开源自托管发布，支持在 H100/H200 上进行 FP8 单机部署
- [Ling-2.6-flash（蚂蚁集团 104B MoE）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/ling-26-flash.md): 在 Clore.ai 上部署蚂蚁集团的 Ling-2.6-flash（104B MoE，7.4B 激活）——经过智能体优化的 flash 兄弟模型，可运行在单张 RTX 4090 上
- [Qwen3.6-27B（稠密，单 GPU）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/qwen36-27b.md): 在 Clore.ai 上部署阿里巴巴的 Qwen3.6-27B——可运行于单张 RTX 4090 的 27B 稠密模型，并提供原生 262K 上下文
- [TGI（文本生成推理）](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/tgi.md): 在 Clore.ai 的 GPU 上运行 HuggingFace Text Generation Inference（TGI），用于生产级 LLM 服务
- [SGLang](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/sglang.md): 在 Clore.ai 的 GPU 上部署 SGLang，使用 RadixAttention 提供高性能 LLM 服务
- [Aphrodite Engine](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/aphrodite-engine.md): 在 Clore.ai 上运行 Aphrodite Engine，用于在旧款和现代 GPU 上进行 LLM 推理
- [LiteLLM AI 网关](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/litellm.md): 在 Clore.ai 的 GPU 上将 LiteLLM 部署为 100+ LLM 的 AI 网关代理
- [MLC-LLM](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/mlc-llm.md)
- [PowerInfer](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/powerinfer.md)
- [LMDeploy](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/lmdeploy.md)
- [Mistral.rs](https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing/mistral-rs.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/yu-yan-mo-xing.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.