# TabbyML Code Completion

TabbyML is a self-hosted AI code completion server — a drop-in replacement for GitHub Copilot that keeps your code entirely on your own infrastructure. Licensed under Apache 2.0, it runs on Clore.ai GPUs and connects to VS Code, JetBrains, and Vim/Neovim via official extensions. Models range from StarCoder2-1B (fits on 4 GB VRAM) to StarCoder2-15B and DeepSeek-Coder for maximum quality.

{% hint style="success" %}
All examples run on GPU servers rented through the [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Key Features

* **Self-hosted Copilot alternative** — your code never leaves your server
* **Apache 2.0 license** — free for commercial use, no restrictions
* **IDE extensions** — VS Code, JetBrains (IntelliJ, PyCharm, WebStorm), Vim/Neovim
* **Multiple models** — StarCoder2 (1B/3B/7B/15B), DeepSeek-Coder, CodeLlama
* **Repository context** — RAG-powered code retrieval for project-aware completions
* **Docker deployment** — single command to launch with GPU support
* **Admin dashboard** — usage analytics, model management, user management
* **Chat interface** — ask coding questions beyond autocompletion

## Requirements

| Component | Minimum        | Recommended     |
| --------- | -------------- | --------------- |
| GPU       | RTX 3060 12 GB | RTX 3080 10 GB+ |
| VRAM      | 4 GB           | 10 GB           |
| RAM       | 8 GB           | 16 GB           |
| Disk      | 20 GB          | 50 GB           |
| CUDA      | 11.8           | 12.1+           |

**Clore.ai pricing:** RTX 3080 ≈ $0.3–1/day · RTX 3060 ≈ $0.15–0.3/day

TabbyML is lightweight — even an RTX 3060 runs StarCoder2-7B with fast inference.

## Quick Start

### 1. Deploy with Docker

```bash
# StarCoder2-7B on GPU (recommended balance of quality and speed)
docker run -d \
  --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby \
  serve \
  --model StarCoder2-7B \
  --device cuda

# Verify it's running
curl http://localhost:8080/v1/health
```

### 2. Choose a Model

| Model               | VRAM    | Speed   | Quality | Best For               |
| ------------------- | ------- | ------- | ------- | ---------------------- |
| StarCoder2-1B       | \~3 GB  | Fastest | Basic   | RTX 3060, fast drafts  |
| StarCoder2-3B       | \~5 GB  | Fast    | Good    | General development    |
| StarCoder2-7B       | \~8 GB  | Medium  | High    | Recommended default    |
| StarCoder2-15B      | \~16 GB | Slower  | Best    | Complex codebases      |
| DeepSeek-Coder-6.7B | \~8 GB  | Medium  | High    | Python, JS, TypeScript |
| CodeLlama-7B        | \~8 GB  | Medium  | Good    | General purpose        |

Switch models by changing the `--model` flag:

```bash
# Lighter model for lower VRAM
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve --model StarCoder2-3B --device cuda

# Largest model for best quality
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve --model StarCoder2-15B --device cuda
```

### 3. Install IDE Extensions

**VS Code:**

1. Open Extensions (Ctrl+Shift+X)
2. Search "Tabby" and install the official extension
3. Open Settings → search "Tabby"
4. Set the server endpoint: `http://<your-clore-ip>:8080`

**JetBrains (IntelliJ, PyCharm, WebStorm):**

1. Settings → Plugins → Marketplace
2. Search "Tabby" and install
3. Settings → Tools → Tabby → Server endpoint: `http://<your-clore-ip>:8080`

**Vim/Neovim:**

```vim
" Using vim-plug
Plug 'TabbyML/vim-tabby'

" Configuration in init.vim / .vimrc
let g:tabby_server_url = 'http://<your-clore-ip>:8080'
```

### 4. Access the Admin Dashboard

Open `http://<your-clore-ip>:8080` in a browser. The dashboard provides:

* Completion usage statistics
* Model status and performance metrics
* User and API token management
* Repository indexing configuration

## Usage Examples

### Add Repository Context (RAG)

Index your repository for project-aware completions:

```bash
# Via the admin API
curl -X POST http://localhost:8080/v1beta/repositories \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-project",
    "git_url": "file:///workspace/my-project"
  }'

# Tabby indexes the repo and uses it for context-aware completions
```

### Use the Chat API

```bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Write a Python function to parse CSV files with error handling"}
    ]
  }'
```

### Run with Authentication

```bash
# Generate an auth token via the admin dashboard, then:
docker run -d --gpus all -p 8080:8080 \
  -v /workspace/tabby-data:/data \
  tabbyml/tabby serve \
  --model StarCoder2-7B \
  --device cuda

# Set the token in your IDE extension settings
# or use the Authorization header:
curl -H "Authorization: Bearer <token>" http://localhost:8080/v1/health
```

### Run Without Docker (Direct Install)

```bash
# Install via Homebrew (Linux)
curl -fsSL https://raw.githubusercontent.com/TabbyML/tabby/main/install.sh | bash

# Or cargo install
cargo install tabby

# Run directly
tabby serve --model StarCoder2-7B --device cuda --port 8080
```

## Cost Comparison

| Solution            | Monthly Cost | Privacy | Latency  |
| ------------------- | ------------ | ------- | -------- |
| GitHub Copilot      | $19/user     | ❌ Cloud | \~200 ms |
| TabbyML on RTX 3060 | \~$5–9/mo    | ✅ Self  | \~50 ms  |
| TabbyML on RTX 3080 | \~$9–30/mo   | ✅ Self  | \~30 ms  |
| TabbyML on RTX 4090 | \~$15–60/mo  | ✅ Self  | \~15 ms  |

For a small team (3–5 developers), a single RTX 3080 on Clore.ai replaces multiple Copilot subscriptions at a fraction of the cost.

## Tips

* **StarCoder2-7B is the sweet spot** — best quality-to-VRAM ratio for most teams
* **Enable repository context** — RAG indexing dramatically improves completion relevance for large codebases
* **Expose port 8080 securely** — use SSH tunneling or a reverse proxy with TLS for production deployments
* **Monitor VRAM usage** — `nvidia-smi` to ensure the model fits with headroom for inference batching
* **Use the completion API** for CI/CD integration — automate code review suggestions
* **Tabby supports multiple users** — the admin dashboard lets you create API tokens per developer
* **Latency matters** — choose a Clore.ai server geographically close to your team for the fastest completions

## Troubleshooting

| Problem                            | Solution                                                            |
| ---------------------------------- | ------------------------------------------------------------------- |
| Docker container exits immediately | Check logs: `docker logs tabby`. Likely VRAM insufficient for model |
| IDE extension not connecting       | Verify endpoint URL, check firewall/port forwarding on Clore.ai     |
| Slow completions                   | Use a smaller model, or ensure GPU is not shared with other tasks   |
| `CUDA out of memory`               | Switch to a smaller model (StarCoder2-3B or 1B)                     |
| Repository indexing stuck          | Check disk space and ensure the git repo is accessible              |
| Auth token rejected                | Regenerate token in admin dashboard, update IDE extension           |
| High latency from remote IDE       | Use SSH tunnel: `ssh -L 8080:localhost:8080 root@<clore-ip>`        |

## Resources

* [TabbyML GitHub](https://github.com/TabbyML/tabby)
* [TabbyML Documentation](https://tabby.tabbyml.com)
* [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=TabbyML.vscode-tabby)
* [CLORE.AI Marketplace](https://clore.ai/marketplace)
