TabbyML Code Completion

Self-host TabbyML as a private GitHub Copilot alternative on Clore.ai

TabbyML is a self-hosted AI code completion server — a drop-in replacement for GitHub Copilot that keeps your code entirely on your own infrastructure. Licensed under Apache 2.0, it runs on Clore.ai GPUs and connects to VS Code, JetBrains, and Vim/Neovim via official extensions. Models range from StarCoder2-1B (fits on 4 GB VRAM) to StarCoder2-15B and DeepSeek-Coder for maximum quality.

circle-check

Key Features

  • Self-hosted Copilot alternative — your code never leaves your server

  • Apache 2.0 license — free for commercial use, no restrictions

  • IDE extensions — VS Code, JetBrains (IntelliJ, PyCharm, WebStorm), Vim/Neovim

  • Multiple models — StarCoder2 (1B/3B/7B/15B), DeepSeek-Coder, CodeLlama

  • Repository context — RAG-powered code retrieval for project-aware completions

  • Docker deployment — single command to launch with GPU support

  • Admin dashboard — usage analytics, model management, user management

  • Chat interface — ask coding questions beyond autocompletion

Requirements

Component
Minimum
Recommended

GPU

RTX 3060 12 GB

RTX 3080 10 GB+

VRAM

4 GB

10 GB

RAM

8 GB

16 GB

Disk

20 GB

50 GB

CUDA

11.8

12.1+

Clore.ai pricing: RTX 3080 ≈ $0.3–1/day · RTX 3060 ≈ $0.15–0.3/day

TabbyML is lightweight — even an RTX 3060 runs StarCoder2-7B with fast inference.

Quick Start

1. Deploy with Docker

2. Choose a Model

Model
VRAM
Speed
Quality
Best For

StarCoder2-1B

~3 GB

Fastest

Basic

RTX 3060, fast drafts

StarCoder2-3B

~5 GB

Fast

Good

General development

StarCoder2-7B

~8 GB

Medium

High

Recommended default

StarCoder2-15B

~16 GB

Slower

Best

Complex codebases

DeepSeek-Coder-6.7B

~8 GB

Medium

High

Python, JS, TypeScript

CodeLlama-7B

~8 GB

Medium

Good

General purpose

Switch models by changing the --model flag:

3. Install IDE Extensions

VS Code:

  1. Open Extensions (Ctrl+Shift+X)

  2. Search "Tabby" and install the official extension

  3. Open Settings → search "Tabby"

  4. Set the server endpoint: http://<your-clore-ip>:8080

JetBrains (IntelliJ, PyCharm, WebStorm):

  1. Settings → Plugins → Marketplace

  2. Search "Tabby" and install

  3. Settings → Tools → Tabby → Server endpoint: http://<your-clore-ip>:8080

Vim/Neovim:

4. Access the Admin Dashboard

Open http://<your-clore-ip>:8080 in a browser. The dashboard provides:

  • Completion usage statistics

  • Model status and performance metrics

  • User and API token management

  • Repository indexing configuration

Usage Examples

Add Repository Context (RAG)

Index your repository for project-aware completions:

Use the Chat API

Run with Authentication

Run Without Docker (Direct Install)

Cost Comparison

Solution
Monthly Cost
Privacy
Latency

GitHub Copilot

$19/user

❌ Cloud

~200 ms

TabbyML on RTX 3060

~$5–9/mo

✅ Self

~50 ms

TabbyML on RTX 3080

~$9–30/mo

✅ Self

~30 ms

TabbyML on RTX 4090

~$15–60/mo

✅ Self

~15 ms

For a small team (3–5 developers), a single RTX 3080 on Clore.ai replaces multiple Copilot subscriptions at a fraction of the cost.

Tips

  • StarCoder2-7B is the sweet spot — best quality-to-VRAM ratio for most teams

  • Enable repository context — RAG indexing dramatically improves completion relevance for large codebases

  • Expose port 8080 securely — use SSH tunneling or a reverse proxy with TLS for production deployments

  • Monitor VRAM usagenvidia-smi to ensure the model fits with headroom for inference batching

  • Use the completion API for CI/CD integration — automate code review suggestions

  • Tabby supports multiple users — the admin dashboard lets you create API tokens per developer

  • Latency matters — choose a Clore.ai server geographically close to your team for the fastest completions

Troubleshooting

Problem
Solution

Docker container exits immediately

Check logs: docker logs tabby. Likely VRAM insufficient for model

IDE extension not connecting

Verify endpoint URL, check firewall/port forwarding on Clore.ai

Slow completions

Use a smaller model, or ensure GPU is not shared with other tasks

CUDA out of memory

Switch to a smaller model (StarCoder2-3B or 1B)

Repository indexing stuck

Check disk space and ensure the git repo is accessible

Auth token rejected

Regenerate token in admin dashboard, update IDE extension

High latency from remote IDE

Use SSH tunnel: ssh -L 8080:localhost:8080 root@<clore-ip>

Resources

Last updated

Was this helpful?