# TabbyML 代码补全

TabbyML 是一个自托管的 AI 代码补全服务器 —— 可作为 GitHub Copilot 的即插即用替代方案，确保你的代码完全保留在你自己的基础设施上。采用 Apache 2.0 许可证，在 Clore.ai GPU 上运行，并通过官方扩展连接到 VS Code、JetBrains 和 Vim/Neovim。模型从 StarCoder2-1B（可在 4 GB 显存上运行）到 StarCoder2-15B 以及 DeepSeek-Coder，以实现最佳质量。

{% hint style="success" %}
所有示例均在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 主要特性

* **自托管的 Copilot 替代品** — 你的代码永远不会离开你的服务器
* **Apache 2.0 许可证** — 商业使用免费，无限制
* **IDE 扩展** — VS Code、JetBrains（IntelliJ、PyCharm、WebStorm）、Vim/Neovim
* **多种模型** — StarCoder2（1B/3B/7B/15B）、DeepSeek-Coder、CodeLlama
* **仓库上下文** — 基于 RAG 的代码检索，用于具有项目感知的补全
* **Docker 部署** — 一条命令即可启动并支持 GPU
* **管理仪表盘** — 使用分析、模型管理、用户管理
* **聊天界面** — 除自动补全外可提问编码问题

## 要求

| 组件   | 最低             | 推荐              |
| ---- | -------------- | --------------- |
| GPU  | RTX 3060 12 GB | RTX 3080 10 GB+ |
| 显存   | 4 GB           | 10 GB           |
| 内存   | 8 GB           | 16 GB           |
| 磁盘   | 20 GB          | 50 GB           |
| CUDA | 11.8           | 12.1+           |

**Clore.ai 价格：** RTX 3080 约 $0.3–1/天 · RTX 3060 约 $0.15–0.3/天

TabbyML 轻量——即使是 RTX 3060 也能以快速推理运行 StarCoder2-7B。

## 快速开始

### 1. 使用 Docker 部署

```bash
# 在 GPU 上运行 StarCoder2-7B（推荐的质量与速度平衡）
docker run -d \
  --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby \
  serve \
  --model StarCoder2-7B \
  --device cuda

# 验证是否正在运行
curl http://localhost:8080/v1/health
```

### 2. 选择模型

| A100                | 显存      | 性能 | 质量 | 最适合                      |
| ------------------- | ------- | -- | -- | ------------------------ |
| StarCoder2-1B       | \~3 GB  | 最快 | 基础 | RTX 3060，快速草稿            |
| StarCoder2-3B       | \~5 GB  | 快速 | 良好 | 通用开发                     |
| StarCoder2-7B       | \~8 GB  | 中等 | 高  | 推荐默认                     |
| StarCoder2-15B      | \~16 GB | 较慢 | 最佳 | 复杂代码库                    |
| DeepSeek-Coder-6.7B | \~8 GB  | 中等 | 高  | 适用于 Python、JS、TypeScript |
| CodeLlama-7B        | \~8 GB  | 中等 | 良好 | 通用用途                     |

通过更改以下项切换模型： `--model` 标志：

```bash
# 更轻的模型以降低显存占用
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve --model StarCoder2-3B --device cuda

# 最大模型以获得最佳质量
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve --model StarCoder2-15B --device cuda
```

### 3. 安装 IDE 扩展

**VS Code：**

1. 打开扩展（Ctrl+Shift+X）
2. 搜索 “Tabby” 并安装官方扩展
3. 打开设置 → 搜索 “Tabby”
4. 设置服务器端点： `http://<your-clore-ip>:8080`

**JetBrains（IntelliJ、PyCharm、WebStorm）：**

1. 设置 → 插件 → 市场
2. 搜索 “Tabby” 并安装
3. 设置 → 工具 → Tabby → 服务器端点： `http://<your-clore-ip>:8080`

**Vim/Neovim：**

```vim
" 使用 vim-plug
Plug 'TabbyML/vim-tabby'

" 在 init.vim / .vimrc 中配置
let g:tabby_server_url = 'http://<your-clore-ip>:8080'
```

### 4. 访问管理仪表盘

打开 `http://<your-clore-ip>:8080` 在浏览器中。仪表盘提供：

* 补全使用统计
* 模型状态和性能指标
* 用户和 API 令牌管理
* 仓库索引配置

## 使用示例

### 添加仓库上下文（RAG）

为具有项目感知的补全索引你的仓库：

```bash
# 通过管理 API
curl -X POST http://localhost:8080/v1beta/repositories \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-project",
    "git_url": "file:///workspace/my-project"
  }'

# Tabby 会索引该仓库并将其用于上下文感知的补全
```

### 使用聊天 API

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a Python function to parse CSV files with error handling"}
    ]
  }'
```

### 使用身份验证运行

```bash
# 在管理仪表盘生成一个授权令牌，然后：
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve \
  --model StarCoder2-7B \
  --device cuda

# 在你的 IDE 扩展设置中设置该令牌
# 或使用 Authorization 头：
curl -H "Authorization: Bearer <token>" http://localhost:8080/v1/health
```

### 无需 Docker 运行（直接安装）

```bash
# 通过 Homebrew 安装（Linux）
curl -fsSL https://raw.githubusercontent.com/TabbyML/tabby/main/install.sh | bash

# 或者通过 cargo 安装
cargo install tabby

# 直接运行
tabby serve --model StarCoder2-7B --device cuda --port 8080
```

## 成本比较

| 解决方案                  | 每月成本       | 隐私    | 延迟       |
| --------------------- | ---------- | ----- | -------- |
| GitHub Copilot        | $19/用户     | ❌ 云端  | \~200 ms |
| 在 RTX 3060 上的 TabbyML | \~$5–9/月   | ✅ 自托管 | \~50 ms  |
| 在 RTX 3080 上的 TabbyML | \~$9–30/月  | ✅ 自托管 | \~30 ms  |
| 在 RTX 4090 上的 TabbyML | \~$15–60/月 | ✅ 自托管 | \~15 ms  |

对于一个小团队（3–5 名开发者），在 Clore.ai 上的一块 RTX 3080 可以替代多个 Copilot 订阅，费用只是其一小部分。

## 提示

* **StarCoder2-7B 是最佳选择** — 对大多数团队来说具有最佳的质量与显存比
* **启用仓库上下文** — RAG 索引能显著提升大型代码库的补全相关性
* **安全地开放 8080 端口** — 在生产部署中使用 SSH 隧道或带 TLS 的反向代理
* **监控显存使用** — `nvidia-smi` 以确保模型适配并留有用于推理批处理的余量
* **使用补全 API** 用于 CI/CD 集成 —— 自动化代码审查建议
* **Tabby 支持多用户** — 管理仪表盘允许你为每位开发者创建 API 令牌
* **延迟很重要** — 为获得最快的补全效果，选择地理位置靠近你的团队的 Clore.ai 服务器

## # 使用固定种子以获得一致结果

| 问题                         | 解决方案                                                    |
| -------------------------- | ------------------------------------------------------- |
| Docker 容器立即退出              | 检查日志： `docker logs tabby`。可能是显存不足以运行模型                  |
| IDE 扩展无法连接                 | 验证端点 URL，检查 Clore.ai 上的防火墙/端口转发                         |
| 补全速度慢                      | 使用更小的模型，或确保 GPU 未与其他任务共享                                |
| `CUDA 内存不足（out of memory）` | 切换到更小的模型（StarCoder2-3B 或 1B）                            |
| 仓库索引卡住                     | 检查磁盘空间并确保 git 仓库可访问                                     |
| 授权令牌被拒绝                    | 在管理仪表盘重新生成令牌，更新 IDE 扩展                                  |
| 来自远程 IDE 的高延迟              | 使用 SSH 隧道： `ssh -L 8080:localhost:8080 root@<clore-ip>` |

## 资源

* [TabbyML GitHub](https://github.com/TabbyML/tabby)
* [TabbyML 文档](https://tabby.tabbyml.com)
* [VS Code 扩展](https://marketplace.visualstudio.com/items?itemName=TabbyML.vscode-tabby)
* [CLORE.AI 市场](https://clore.ai/marketplace)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/ai-bian-ma-gong-ju/tabby.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.