# Continue.dev AI 编码

Continue.dev 是一个用于 VS Code 和 JetBrains 的开源 AI 编码助手，拥有 25K+ GitHub 星标。该 **扩展在你的本地机器上运行** （或在你的 IDE 中），但它会连接到后端模型服务器进行推理。通过将 Continue.dev 指向从 Clore.ai 租用的强大 GPU，你将获得：

* **顶级编码模型** （34B+ 参数）无法放在你的笔记本上
* **完全隐私** — 代码保留在你控制的基础设施上
* **灵活费用** — 仅在你编码时付费（约 $0.20–0.50/小时 vs. Copilot 每月 $19）
* **兼容 OpenAI 的 API** — Continue.dev 可无缝连接到 Ollama、vLLM 或 TabbyML

本指南侧重于设置 **Clore.ai GPU 后端** （Ollama 或 vLLM），你的本地 Continue.dev 扩展将连接到该后端。

{% hint style="success" %}
所有 GPU 服务器示例均使用通过 [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

{% hint style="info" %}
**架构**：你的 IDE（带 Continue.dev 扩展）→ 互联网 → Clore.ai GPU 服务器（运行 Ollama / vLLM / TabbyML）→ 本地模型推理。代码绝不会触及第三方 API。
{% endhint %}

## 概览

| 属性            | 详情                                                              |
| ------------- | --------------------------------------------------------------- |
| **项目**        | [continuedev/continue](https://github.com/continuedev/continue) |
| **许可证**       | Apache 2.0                                                      |
| **GitHub 星标** | 25K+                                                            |
| **IDE 支持**    | VS Code、JetBrains（IntelliJ、PyCharm、WebStorm、GoLand 等）           |
| **配置文件**      | `~/.continue/config.json`                                       |
| **后端选项**      | Ollama、vLLM、TabbyML、LM Studio、llama.cpp、兼容 OpenAI 的 API         |
| **难度**        | 简单（扩展安装）/ 中等（自托管后端）                                             |
| **是否需要 GPU？** | 在 Clore.ai 服务器上（是）；在你的笔记本上（否）                                   |
| **主要功能**      | 自动补全、聊天、编辑模式、代码库上下文（RAG）、自定义斜杠命令                                |

### 推荐的编码模型

| 模型                    | 显存      | 特点              | 说明                 |
| --------------------- | ------- | --------------- | ------------------ |
| `codellama:7b`        | \~6 GB  | 快速自动补全          | 良好的起点              |
| `codellama:13b`       | \~10 GB | 均衡              | 自动补全的最佳质量/速度平衡     |
| `codellama:34b`       | \~22 GB | 最佳 CodeLlama 质量 | 需要 RTX 3090 / A100 |
| `deepseek-coder:6.7b` | \~5 GB  | Python/JS 专家型   | 非常适合 Web 开发        |
| `deepseek-coder:33b`  | \~22 GB | 顶级开源            | 在代码方面可与 GPT-4 媲美   |
| `qwen2.5-coder:7b`    | \~6 GB  | 多语言代码           | 在 40+ 语言上表现出色      |
| `qwen2.5-coder:32b`   | \~22 GB | 最先进             | 2024 年最佳开源编码模型     |
| `starcoder2:15b`      | \~12 GB | 代码补全专家          | 支持 FIM（填中间）        |

## 要求

### Clore.ai 服务器要求

| 级别     | GPU       | 显存    | 内存（RAM） | 磁盘     | 价格         | 模型                                           |
| ------ | --------- | ----- | ------- | ------ | ---------- | -------------------------------------------- |
| **预算** | RTX 3060  | 12 GB | 16 GB   | 40 GB  | \~$0.10/小时 | CodeLlama 7B、DeepSeek 6.7B、Qwen2.5-Coder 7B  |
| **推荐** | RTX 3090  | 24 GB | 32 GB   | 80 GB  | \~$0.20/小时 | CodeLlama 34B、DeepSeek 33B、Qwen2.5-Coder 32B |
| **性能** | RTX 4090  | 24 GB | 32 GB   | 80 GB  | \~$0.35/小时 | 与上述相同的模型，更快的推理速度                             |
| **算力** | A100 40GB | 40 GB | 64 GB   | 120 GB | \~$0.60/小时 | 可同时运行多个 34B 模型                               |
| **最大** | A100 80GB | 80 GB | 80 GB   | 200 GB | \~$1.10/小时 | 70B 模型（CodeLlama 70B）                        |

### 本地要求（你的机器）

* VS Code 或任意 JetBrains IDE
* 安装 Continue.dev 扩展
* 与 Clore.ai 服务器的稳定网络连接
* **不需要本地 GPU** — 所有推理均在 Clore.ai 上进行

## 快速开始

### 第 1 部分：设置 Clore.ai 后端

#### 选项 A — Ollama 后端（大多数用户推荐）

Ollama 是 Continue.dev 最简单的后端 — 设置简单、出色的模型管理、兼容 OpenAI 的 API。

```bash
# 1. SSH 登入你的 Clore.ai 服务器
ssh root@<clore-server-ip> -p <port>

# 2. 启动具有 GPU 支持的 Ollama
docker run -d \
  --name ollama \
  --gpus all \
  -p 11434:11434 \
  -v /workspace/ollama:/root/.ollama \
  --restart unless-stopped \
  ollama/ollama

# 3. 验证 Ollama 是否正在运行
curl http://localhost:11434/

# 4. 拉取你的编码模型（根据显存选择）
# 对于 12GB 显存（RTX 3060）：
docker exec ollama ollama pull codellama:13b

# 对于 24GB 显存（RTX 3090 / RTX 4090）：
docker exec ollama ollama pull qwen2.5-coder:32b
# 或：
docker exec ollama ollama pull deepseek-coder:33b

# 5. 拉取一个快速自动补全模型（与聊天模型分开）
docker exec ollama ollama pull starcoder2:3b   # 非常快，适合 FIM 自动补全

# 6. 验证模型是否可用
docker exec ollama ollama list

# 7. 测试推理
docker exec ollama ollama run qwen2.5-coder:32b "Write a Python function to binary search a sorted list"
```

若要将 Ollama 对外暴露（以便你的本地 IDE 可以连接）：

```bash
# 重新启动 Ollama 并启用外部访问
docker stop ollama && docker rm ollama

docker run -d \
  --name ollama \
  --gpus all \
  -p 11434:11434 \
  -v /workspace/ollama:/root/.ollama \
  -e OLLAMA_HOST=0.0.0.0 \
  --restart unless-stopped \
  ollama/ollama

# 在你的本地机器上测试：
curl http://<clore-server-ip>:11434/api/tags
```

{% hint style="warning" %}
公开暴露 11434 端口默认没有认证。用于生产环境时，请改为设置 SSH 隧道（参见 [提示与最佳实践](#tips--best-practices)).
{% endhint %}

#### 选项 B — vLLM 后端（高吞吐 / 兼容 OpenAI）

vLLM 提供更快的推理和多用户支持。如果多个开发者共享一台 Clore.ai 服务器，这是理想选择。

```bash
# 使用兼容 OpenAI 的 API 启动 vLLM
docker run -d \
  --name vllm \
  --gpus all \
  -p 8000:8000 \
  -v /workspace/hf-models:/root/.cache/huggingface \
  -e HF_TOKEN="your-huggingface-token" \
  --restart unless-stopped \
  vllm/vllm-openai:latest \
  --model Qwen/Qwen2.5-Coder-32B-Instruct \
  --dtype auto \
  --max-model-len 32768 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.90 \
  --served-model-name qwen2.5-coder-32b

# 对于多 GPU（例如，两块 RTX 3090）：
docker run -d \
  --name vllm \
  --gpus all \
  -p 8000:8000 \
  -v /workspace/hf-models:/root/.cache/huggingface \
  -e HF_TOKEN="your-huggingface-token" \
  vllm/vllm-openai:latest \
  --model deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct \
  --tensor-parallel-size 2 \
  --dtype auto \
  --max-model-len 65536 \
  --served-model-name deepseek-coder-v2

# 测试 API
curl http://localhost:8000/v1/models
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder-32b",
    "messages": [{"role": "user", "content": "Write a hello world in Rust"}],
    "max_tokens": 200
  }'
```

#### 选项 C — TabbyML 后端（FIM 自动补全专家）

TabbyML 在填中间（FIM）自动补全方面表现优越——即内联的幽灵文本建议。参见 [TabbyML 文档](https://tabby.tabbyml.com/) 以获取完整的设置细节。

```bash
# Continue.dev 自动补全的快速 TabbyML 设置
docker run -d \
  --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  --restart unless-stopped \
  tabbyml/tabby serve \
  --model StarCoder2-7B \
  --chat-model Mistral-7B \
  --device cuda

# 验证
curl http://localhost:8080/v1/health
```

### 第 2 部分：安装 Continue.dev 扩展

**VS Code：**

1. 打开扩展面板（`Ctrl+Shift+X` / `Cmd+Shift+X`)
2. 搜索 **"Continue"** — 安装 Continue（continuedev）提供的官方扩展
3. 点击侧边栏的 Continue 图标（或 `Ctrl+Shift+I`)

**JetBrains（IntelliJ、PyCharm、WebStorm、GoLand）：**

1. `文件 → 设置 → 插件 → 市场`
2. 搜索 **"Continue"** 然后安装
3. 重启 IDE；Continue 面板将出现在右侧边栏

### 第 3 部分：配置 Continue.dev 使用 Clore.ai

编辑 `~/.continue/config.json` 在你的 **本地机器上**:

```json
{
  "models": [
    {
      "title": "Clore.ai — Qwen2.5-Coder 32B",
      "provider": "ollama",
      "model": "qwen2.5-coder:32b",
      "apiBase": "http://<clore-server-ip>:11434",
      "contextLength": 32768,
      "completionOptions": {
        "temperature": 0.1,
        "topP": 0.95,
        "maxTokens": 4096
      }
    },
    {
      "title": "Clore.ai — CodeLlama 13B (fast)",
      "provider": "ollama",
      "model": "codellama:13b",
      "apiBase": "http://<clore-server-ip>:11434",
      "contextLength": 16384
    }
  ],
  "tabAutocompleteModel": {
    "title": "StarCoder2 3B (autocomplete)",
    "provider": "ollama",
    "model": "starcoder2:3b",
    "apiBase": "http://<clore-server-ip>:11434"
  },
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "apiBase": "http://<clore-server-ip>:11434"
  },
  "contextProviders": [
    { "name": "code" },
    { "name": "docs" },
    { "name": "diff" },
    { "name": "terminal" },
    { "name": "problems" },
    { "name": "folder" },
    { "name": "codebase" }
  ],
  "slashCommands": [
    { "name": "edit", "description": "编辑所选代码" },
    { "name": "comment", "description": "向代码添加注释" },
    { "name": "share", "description": "将对话导出为 markdown" },
    { "name": "cmd", "description": "生成终端命令" },
    { "name": "commit", "description": "生成 git 提交信息" }
  ]
}
```

对于 **vLLM 后端** 而不是 Ollama：

```json
{
  "models": [
    {
      "title": "Clore.ai — DeepSeek Coder 33B (vLLM)",
      "provider": "openai",
      "model": "deepseek-coder-v2",
      "apiBase": "http://<clore-server-ip>:8000/v1",
      "apiKey": "not-required",
      "contextLength": 65536,
      "completionOptions": {
        "temperature": 0.0,
        "maxTokens": 8192
      }
    }
  ]
}
```

对于 **TabbyML 后端** （仅限自动补全）：

```json
{
  "tabAutocompleteModel": {
    "title": "Clore.ai — TabbyML StarCoder2",
    "provider": "openai",
    "model": "StarCoder2-7B",
    "apiBase": "http://<clore-server-ip>:8080/v1",
    "apiKey": "auth-token-if-set"
  }
}
```

## 配置

### SSH 隧道设置（安全远程访问）

与公开暴露端口相比，请从本地机器使用 SSH 隧道：

```bash
# 打开 SSH 隧道：本地端口 11434 → Clore.ai 服务器端口 11434
ssh -N -L 11434:localhost:11434 root@<clore-server-ip> -p <clore-ssh-port>

# 保持隧道连接（添加到 ~/.ssh/config）：
Host clore-coding
  HostName <clore-server-ip>
  Port <clore-ssh-port>
  User root
  LocalForward 11434 localhost:11434
  LocalForward 8000 localhost:8000
  ServerAliveInterval 60
  ServerAliveCountMax 3

# 使用：
ssh -N clore-coding

# 然后在 config.json 中使用 localhost：
# "apiBase": "http://localhost:11434"
```

### 使用 autossh 的持久隧道

```bash
# 在你的本地机器上安装 autossh（Linux/macOS）
sudo apt install autossh   # Ubuntu/Debian
brew install autossh       # macOS

# 运行会自动重连的持久隧道
autossh -M 0 -N \
  -o "ServerAliveInterval 30" \
  -o "ServerAliveCountMax 3" \
  -L 11434:localhost:11434 \
  root@<clore-server-ip> -p <clore-ssh-port>

# 将其添加到 systemd 以在启动时自动运行（Linux）
cat > ~/.config/systemd/user/clore-tunnel.service << 'EOF'
[Unit]
Description=SSH tunnel to Clore.ai coding server
After=network.target

[Service]
ExecStart=autossh -M 0 -N \
  -o StrictHostKeyChecking=accept-new \
  -o ServerAliveInterval=30 \
  -o ServerAliveCountMax=3 \
  -L 11434:localhost:11434 \
  root@CLORE_IP -p CLORE_PORT
Restart=always
RestartSec=10

[Install]
WantedBy=default.target
EOF

systemctl --user enable clore-tunnel
systemctl --user start clore-tunnel
```

### 为不同任务加载多个模型

对于 RTX 3090（24 GB），你可以同时运行大型聊天模型和小型自动补全模型：

```bash
# 在 Clore.ai 服务器上：

# 拉取模型
docker exec ollama ollama pull qwen2.5-coder:32b      # 聊天（22 GB）
docker exec ollama ollama pull starcoder2:3b           # 自动补全（2 GB）
docker exec ollama ollama pull nomic-embed-text        # 嵌入（0.5 GB）

# Ollama 自动处理模型切换
# 通过智能缓存，三者都可适配 24 GB 显存

# 监控显存使用情况
nvidia-smi --query-gpu=memory.used,memory.free --format=csv -l 5
```

### 代码库索引（针对你的仓库的 RAG）

Continue.dev 可以为你的代码库建立索引以提供上下文感知的建议。拉取一个嵌入模型：

```bash
# 在 Clore.ai 服务器上 — 将嵌入模型添加到 Ollama
docker exec ollama ollama pull nomic-embed-text

# 在本地的 config.json 中，上文已配置好嵌入。
# Continue.dev 将自动为你打开的工作区建立索引。
# 触发手动重新索引：Ctrl+Shift+P → "Continue: Index Codebase"
```

## GPU 加速

### 监控推理性能

```bash
# 在你的 Clore.ai 服务器上 — 在编码会话期间监视 GPU
watch -n 1 nvidia-smi

# 检查每秒令牌数（Ollama 日志）
docker logs ollama --tail 20 -f

# 详细的 GPU 统计信息
nvidia-smi dmon -s u -d 2

# 内存细分
nvidia-smi --query-gpu=name,memory.used,memory.free,utilization.gpu \
  --format=csv,noheader -l 5
```

### 按 GPU 预期的性能

| GPU           | 模型                      | 上下文 | 每秒令牌（大约）    |
| ------------- | ----------------------- | --- | ----------- |
| RTX 3060 12GB | CodeLlama 7B            | 8K  | \~40–60 t/s |
| RTX 3060 12GB | DeepSeek-Coder 6.7B     | 8K  | \~45–65 t/s |
| RTX 3090 24GB | Qwen2.5-Coder 32B（Q4）   | 16K | \~15–25 t/s |
| RTX 3090 24GB | DeepSeek-Coder 33B（Q4）  | 16K | \~15–22 t/s |
| RTX 4090 24GB | Qwen2.5-Coder 32B（Q4）   | 16K | \~25–40 t/s |
| A100 40GB     | Qwen2.5-Coder 32B（FP16） | 32K | \~35–50 t/s |
| A100 80GB     | CodeLlama 70B（Q4）       | 32K | \~20–30 t/s |

对于自动补全（填中间）， **starcoder2:3b** 或 **codellama:7b** 可达到 50–100 t/s — 在 IDE 中感觉几乎是即时的。

### 调优 Ollama 以获得更好性能

```bash
# 在 Clore.ai 服务器上 — 优化 Ollama 设置
docker stop ollama && docker rm ollama

docker run -d \
  --name ollama \
  --gpus all \
  -p 11434:11434 \
  -v /workspace/ollama:/root/.ollama \
  -e OLLAMA_HOST=0.0.0.0 \
  -e OLLAMA_NUM_PARALLEL=2 \
  -e OLLAMA_MAX_LOADED_MODELS=2 \
  -e OLLAMA_FLASH_ATTENTION=1 \
  --restart unless-stopped \
  ollama/ollama

# OLLAMA_NUM_PARALLEL=2：同时服务 2 个请求
# OLLAMA_MAX_LOADED_MODELS=2：在 GPU 内存中保留 2 个模型
# OLLAMA_FLASH_ATTENTION=1：启用 flash attention（更快、占用更少内存）
```

## 提示与最佳实践

### 针对不同任务使用不同模型

为不同任务类型在 Continue.dev 中配置专用模型 — UI 允许你在对话中途切换模型：

```json
{
  "models": [
    {
      "title": "Chat — Qwen2.5-Coder 32B",
      "provider": "ollama",
      "model": "qwen2.5-coder:32b",
      "apiBase": "http://localhost:11434",
      "contextLength": 32768,
      "description": "适用于复杂问题、代码审查、架构决策"
    },
    {
      "title": "Fast — CodeLlama 7B",
      "provider": "ollama",
      "model": "codellama:7b",
      "apiBase": "http://localhost:11434",
      "contextLength": 8192,
      "description": "快速回答、简单补全、低延迟"
    },
    {
      "title": "Autocomplete — StarCoder2 3B",
      "provider": "ollama",
      "model": "starcoder2:3b",
      "apiBase": "http://localhost:11434",
      "contextLength": 4096,
      "description": "内联幽灵文本建议"
    }
  ]
}
```

### 费用比较

| 解决方案                  | 每月费用（每日使用 8 小时） | 隐私            | 模型质量              |
| --------------------- | --------------- | ------------- | ----------------- |
| GitHub Copilot        | $19/用户/月        | ❌ Microsoft 云 | GPT-4o（封闭）        |
| Cursor Pro            | $20/用户/月        | ❌ Cursor 云    | Claude 3.5（封闭）    |
| Clore.ai 上的 RTX 3060  | \~$24/月         | ✅ 你的服务器       | CodeLlama 13B     |
| Clore.ai 上的 RTX 3090  | ≈$48/月          | ✅ 你的服务器       | Qwen2.5-Coder 32B |
| Clore.ai 上的 RTX 4090  | ≈$84/月          | ✅ 你的服务器       | Qwen2.5-Coder 32B |
| Clore.ai 上的 A100 80GB | ≈$264/月         | ✅ 你的服务器       | CodeLlama 70B     |

对于一支由 3 名以上开发者共享一台 Clore.ai RTX 3090（总计约 $48/月）的团队，按用户成本优于 Copilot，同时提供更大且私有的模型。

### 不编码时关闭

Clore.ai 按小时计费。使用一个简单脚本来启动/停止服务器：

```bash
# 将这些保存为本地脚本

# start-coding-server.sh
#!/bin/bash
echo "正在打开到 Clore.ai 的 SSH 隧道..."
ssh -N -f -L 11434:localhost:11434 clore-coding
echo "隧道已打开。Continue.dev 已就绪。"

# stop-coding-server.sh
#!/bin/bash
echo "正在关闭 SSH 隧道..."
pkill -f "ssh.*clore-coding"
echo "隧道已关闭。记得停止你的 Clore.ai 订单以停止计费！"
```

### 使用 Continue.dev 自定义命令

向 `config.json` 添加常用编码工作流程的自定义斜杠命令：

```json
{
  "customCommands": [
    {
      "name": "review",
      "prompt": "审查此代码以查找错误、安全问题和性能问题。要具体且可执行。",
      "description": "代码审查"
    },
    {
      "name": "test",
      "prompt": "为此代码编写全面的单元测试。包含边界情况。使用与代码相同的语言/框架。",
      "description": "生成测试"
    },
    {
      "name": "docstring",
      "prompt": "按照该语言的最佳实践，为此代码添加清晰、全面的文档字符串/注释。",
      "description": "添加文档"
    },
    {
      "name": "optimize",
      "prompt": "优化此代码以提高性能。解释你做了哪些更改以及原因。",
      "description": "优化代码"
    }
  ]
}
```

## 故障排除

| 问题                                         | 可能原因             | 解决方案                                                                    |
| ------------------------------------------ | ---------------- | ----------------------------------------------------------------------- |
| Continue.dev 显示“Connection refused（连接被拒绝）” | Ollama 无法访问      | 检查 SSH 隧道是否处于活动状态；验证 `curl http://localhost:11434/` 是否工作                |
| 自动完成未触发                                    | Tab 自动完成模型未设置    | 添加 `tabAutocompleteModel` 到 config.json；在 Continue 设置中启用                |
| 响应非常慢（首个 token 超过 30 秒）                    | 模型正在从磁盘加载        | 首次请求会将模型加载到显存——后续请求会很快                                                  |
| 出现“Model not found（未找到模型）”错误               | 模型未被拉取           | 运行 `docker exec ollama ollama pull <model-name>` 在 Clore.ai 服务器上        |
| tokens 之间延迟高                               | 网络延迟或模型过大        | 使用 SSH 隧道；切换到更小的模型；检查服务器 GPU 利用率                                        |
| 代码库上下文不起作用                                 | 缺少 Embeddings 模型 | 拉取 `nomic-embed-text` 通过 Ollama；检查 `embeddingsProvider` 在 config.json 中 |
| SSH 隧道经常断开                                 | 连接不稳定            | 使用 `autossh` 以实现持久重连；添加 `ServerAliveInterval 30`                        |
| 上下文窗口超出                                    | 文件/对话过长          | 减少 `contextLength` 在 config.json 中；使用具有更长上下文的模型                         |
| JetBrains 插件未加载                            | IDE 版本不兼容        | 将 JetBrains IDE 更新到最新；检查 Continue.dev 插件兼容性矩阵                           |
| vLLM 在加载时 OOM（内存不足）                        | 显存不足             | 添加 `--gpu-memory-utilization 0.85`；使用更小的模型或量化版本                         |

### 调试命令

```bash
# 在你的本地机器上 — 测试连通性
curl http://localhost:11434/api/tags          # 如果使用 SSH 隧道
curl http://<clore-ip>:11434/api/tags        # 如果端口直接开放

# 在 CLORE.AI 服务器上 — 检查 Ollama
docker logs ollama --tail 30 -f
docker exec ollama ollama list
docker exec ollama ollama ps                  # 显示当前加载的模型

# 测试模型响应时间
time curl http://localhost:11434/api/generate \
  -d '{"model": "codellama:7b", "prompt": "def hello():", "stream": false}'

# 检查 GPU 内存
nvidia-smi --query-gpu=memory.used,memory.free --format=csv

# 检查 vLLM 日志
docker logs vllm --tail 50 -f

# 在不丢失模型的情况下重启 Ollama
docker restart ollama
```

### Continue.dev 配置验证

```bash
# 在本地机器上验证 config.json 语法
python3 -c "
import json, sys
try:
    config = json.load(open(sys.argv[1]))
    print('✅ 配置是有效的 JSON')
    print(f'Models: {[m[\"title\"] for m in config.get(\"models\", [])]}')
except Exception as e:
    print(f'❌ 错误: {e}')
" ~/.continue/config.json
```

## 延伸阅读

* [Continue.dev 文档](https://docs.continue.dev/) — 官方文档，涵盖所有 IDE 集成和配置选项
* [Continue.dev GitHub](https://github.com/continuedev/continue) — 源代码、问题、模型兼容性
* [Continue.dev 配置参考](https://docs.continue.dev/reference) — 完整的 `config.json` 模式
* [Clore.ai 上的 Ollama](/guides/guides_v2-zh/yu-yan-mo-xing/ollama.md) — 详细的 Ollama 设置指南（推荐的后端）
* [Clore.ai 上的 vLLM](/guides/guides_v2-zh/yu-yan-mo-xing/vllm.md) — 面向团队的高性能替代后端
* [TabbyML](https://tabby.tabbyml.com/) — 具有 FIM 优化的专用自动完成后端
* [GPU 对比指南](/guides/guides_v2-zh/kai-shi-shi-yong/gpu-comparison.md) — 为你的编码工作负载选择合适的 GPU
* [模型兼容性](/guides/guides_v2-zh/kai-shi-shi-yong/model-compatibility.md) — 哪些模型适合哪些显存大小
* [Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) — 目前最佳的开源编码模型
* [DeepSeek-Coder-V2](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct) — 具有长上下文的强力替代方案
* [CLORE.AI 市场](https://clore.ai/marketplace) — 租用 GPU 服务器


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/ai-ping-tai-yu-zhi-neng-ti/continue-dev.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.