# TabbyML Code Completion

TabbyML is a self-hosted AI code completion server — a drop-in replacement for GitHub Copilot that keeps your code entirely on your own infrastructure. Licensed under Apache 2.0, it runs on Clore.ai GPUs and connects to VS Code, JetBrains, and Vim/Neovim via official extensions. Models range from StarCoder2-1B (fits on 4 GB VRAM) to StarCoder2-15B and DeepSeek-Coder for maximum quality.

{% hint style="success" %}
All examples run on GPU servers rented through the [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Key Features

* **Self-hosted Copilot alternative** — your code never leaves your server
* **Apache 2.0 license** — free for commercial use, no restrictions
* **IDE extensions** — VS Code, JetBrains (IntelliJ, PyCharm, WebStorm), Vim/Neovim
* **Multiple models** — StarCoder2 (1B/3B/7B/15B), DeepSeek-Coder, CodeLlama
* **Repository context** — RAG-powered code retrieval for project-aware completions
* **Docker deployment** — single command to launch with GPU support
* **Admin dashboard** — usage analytics, model management, user management
* **Chat interface** — ask coding questions beyond autocompletion

## Requirements

| Component | Minimum        | Recommended     |
| --------- | -------------- | --------------- |
| GPU       | RTX 3060 12 GB | RTX 3080 10 GB+ |
| VRAM      | 4 GB           | 10 GB           |
| RAM       | 8 GB           | 16 GB           |
| Disk      | 20 GB          | 50 GB           |
| CUDA      | 11.8           | 12.1+           |

**Clore.ai pricing:** RTX 3080 ≈ $0.3–1/day · RTX 3060 ≈ $0.15–0.3/day

TabbyML is lightweight — even an RTX 3060 runs StarCoder2-7B with fast inference.

## Quick Start

### 1. Deploy with Docker

```bash
# StarCoder2-7B on GPU (recommended balance of quality and speed)
docker run -d \
  --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby \
  serve \
  --model StarCoder2-7B \
  --device cuda

# Verify it's running
curl http://localhost:8080/v1/health
```

### 2. Choose a Model

| Model               | VRAM    | Speed   | Quality | Best For               |
| ------------------- | ------- | ------- | ------- | ---------------------- |
| StarCoder2-1B       | \~3 GB  | Fastest | Basic   | RTX 3060, fast drafts  |
| StarCoder2-3B       | \~5 GB  | Fast    | Good    | General development    |
| StarCoder2-7B       | \~8 GB  | Medium  | High    | Recommended default    |
| StarCoder2-15B      | \~16 GB | Slower  | Best    | Complex codebases      |
| DeepSeek-Coder-6.7B | \~8 GB  | Medium  | High    | Python, JS, TypeScript |
| CodeLlama-7B        | \~8 GB  | Medium  | Good    | General purpose        |

Switch models by changing the `--model` flag:

```bash
# Lighter model for lower VRAM
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve --model StarCoder2-3B --device cuda

# Largest model for best quality
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve --model StarCoder2-15B --device cuda
```

### 3. Install IDE Extensions

**VS Code:**

1. Open Extensions (Ctrl+Shift+X)
2. Search "Tabby" and install the official extension
3. Open Settings → search "Tabby"
4. Set the server endpoint: `http://<your-clore-ip>:8080`

**JetBrains (IntelliJ, PyCharm, WebStorm):**

1. Settings → Plugins → Marketplace
2. Search "Tabby" and install
3. Settings → Tools → Tabby → Server endpoint: `http://<your-clore-ip>:8080`

**Vim/Neovim:**

```vim
" Using vim-plug
Plug 'TabbyML/vim-tabby'

" Configuration in init.vim / .vimrc
let g:tabby_server_url = 'http://<your-clore-ip>:8080'
```

### 4. Access the Admin Dashboard

Open `http://<your-clore-ip>:8080` in a browser. The dashboard provides:

* Completion usage statistics
* Model status and performance metrics
* User and API token management
* Repository indexing configuration

## Usage Examples

### Add Repository Context (RAG)

Index your repository for project-aware completions:

```bash
# Via the admin API
curl -X POST http://localhost:8080/v1beta/repositories \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-project",
    "git_url": "file:///workspace/my-project"
  }'

# Tabby indexes the repo and uses it for context-aware completions
```

### Use the Chat API

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a Python function to parse CSV files with error handling"}
    ]
  }'
```

### Run with Authentication

```bash
# Generate an auth token via the admin dashboard, then:
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve \
  --model StarCoder2-7B \
  --device cuda

# Set the token in your IDE extension settings
# or use the Authorization header:
curl -H "Authorization: Bearer <token>" http://localhost:8080/v1/health
```

### Run Without Docker (Direct Install)

```bash
# Install via Homebrew (Linux)
curl -fsSL https://raw.githubusercontent.com/TabbyML/tabby/main/install.sh | bash

# Or cargo install
cargo install tabby

# Run directly
tabby serve --model StarCoder2-7B --device cuda --port 8080
```

## Cost Comparison

| Solution            | Monthly Cost | Privacy | Latency  |
| ------------------- | ------------ | ------- | -------- |
| GitHub Copilot      | $19/user     | ❌ Cloud | \~200 ms |
| TabbyML on RTX 3060 | \~$5–9/mo    | ✅ Self  | \~50 ms  |
| TabbyML on RTX 3080 | \~$9–30/mo   | ✅ Self  | \~30 ms  |
| TabbyML on RTX 4090 | \~$15–60/mo  | ✅ Self  | \~15 ms  |

For a small team (3–5 developers), a single RTX 3080 on Clore.ai replaces multiple Copilot subscriptions at a fraction of the cost.

## Tips

* **StarCoder2-7B is the sweet spot** — best quality-to-VRAM ratio for most teams
* **Enable repository context** — RAG indexing dramatically improves completion relevance for large codebases
* **Expose port 8080 securely** — use SSH tunneling or a reverse proxy with TLS for production deployments
* **Monitor VRAM usage** — `nvidia-smi` to ensure the model fits with headroom for inference batching
* **Use the completion API** for CI/CD integration — automate code review suggestions
* **Tabby supports multiple users** — the admin dashboard lets you create API tokens per developer
* **Latency matters** — choose a Clore.ai server geographically close to your team for the fastest completions

## Troubleshooting

| Problem                            | Solution                                                            |
| ---------------------------------- | ------------------------------------------------------------------- |
| Docker container exits immediately | Check logs: `docker logs tabby`. Likely VRAM insufficient for model |
| IDE extension not connecting       | Verify endpoint URL, check firewall/port forwarding on Clore.ai     |
| Slow completions                   | Use a smaller model, or ensure GPU is not shared with other tasks   |
| `CUDA out of memory`               | Switch to a smaller model (StarCoder2-3B or 1B)                     |
| Repository indexing stuck          | Check disk space and ensure the git repo is accessible              |
| Auth token rejected                | Regenerate token in admin dashboard, update IDE extension           |
| High latency from remote IDE       | Use SSH tunnel: `ssh -L 8080:localhost:8080 root@<clore-ip>`        |

## Resources

* [TabbyML GitHub](https://github.com/TabbyML/tabby)
* [TabbyML Documentation](https://tabby.tabbyml.com)
* [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=TabbyML.vscode-tabby)
* [CLORE.AI Marketplace](https://clore.ai/marketplace)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/ai-coding-tools/tabby.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.