Continue.dev AI Coding

Power Continue.dev with Clore.ai GPUs — run CodeLlama 34B, DeepSeek Coder, and Qwen2.5-Coder locally on cheap GPU rentals for private AI coding assistance.

Continue.dev is an open-source AI coding assistant for VS Code and JetBrains with 25K+ GitHub stars. The extension runs on your local machine (or in your IDE), but it connects to a backend model server for inference. By pointing Continue.dev at a powerful GPU rented from Clore.ai, you get:

  • Top-tier coding models (34B+ parameters) that won't fit on your laptop

  • Full privacy — code stays on infrastructure you control

  • Flexible costs — pay only while you're coding (~$0.20–0.50/hr vs. $19/mo for Copilot)

  • OpenAI-compatible API — Continue.dev connects to Ollama, vLLM, or TabbyML seamlessly

This guide focuses on setting up the Clore.ai GPU backend (Ollama or vLLM) that your local Continue.dev extension connects to.

circle-check
circle-info

Architecture: Your IDE (with Continue.dev extension) → Internet → Clore.ai GPU server (running Ollama / vLLM / TabbyML) → local model inference. No code ever touches a third-party API.

Overview

Property
Details

License

Apache 2.0

GitHub Stars

25K+

IDE Support

VS Code, JetBrains (IntelliJ, PyCharm, WebStorm, GoLand, etc.)

Config File

~/.continue/config.json

Backend Options

Ollama, vLLM, TabbyML, LM Studio, llama.cpp, OpenAI-compatible APIs

Difficulty

Easy (extension install) / Medium (self-hosted backend)

GPU Required?

On the Clore.ai server (yes); on your laptop (no)

Key Features

Autocomplete, chat, edit mode, codebase context (RAG), custom slash commands

Model
VRAM
Strength
Notes

codellama:7b

~6 GB

Fast autocomplete

Good starting point

codellama:13b

~10 GB

Balanced

Best quality/speed for autocomplete

codellama:34b

~22 GB

Best CodeLlama quality

Needs RTX 3090 / A100

deepseek-coder:6.7b

~5 GB

Python/JS specialist

Excellent for web dev

deepseek-coder:33b

~22 GB

Top-tier open source

Rivals GPT-4 on code

qwen2.5-coder:7b

~6 GB

Multilingual code

Strong on 40+ languages

qwen2.5-coder:32b

~22 GB

State-of-the-art

Best open coding model 2024

starcoder2:15b

~12 GB

Code completion specialist

FIM (fill-in-the-middle) support

Requirements

Clore.ai Server Requirements

Tier
GPU
VRAM
RAM
Disk
Price
Models

Budget

RTX 3060

12 GB

16 GB

40 GB

~$0.10/hr

CodeLlama 7B, DeepSeek 6.7B, Qwen2.5-Coder 7B

Recommended

RTX 3090

24 GB

32 GB

80 GB

~$0.20/hr

CodeLlama 34B, DeepSeek 33B, Qwen2.5-Coder 32B

Performance

RTX 4090

24 GB

32 GB

80 GB

~$0.35/hr

Same models as above, faster inference

Power

A100 40GB

40 GB

64 GB

120 GB

~$0.60/hr

Multiple 34B models concurrently

Maximum

A100 80GB

80 GB

80 GB

200 GB

~$1.10/hr

70B models (CodeLlama 70B)

Local Requirements (Your Machine)

  • VS Code or any JetBrains IDE

  • Continue.dev extension installed

  • Stable internet connection to your Clore.ai server

  • No local GPU needed — all inference happens on Clore.ai

Quick Start

Part 1: Set Up the Clore.ai Backend

Ollama is the easiest backend for Continue.dev — simple setup, excellent model management, OpenAI-compatible API.

To expose Ollama externally (so your local IDE can connect):

circle-exclamation

Option B — vLLM Backend (High-Throughput / OpenAI-Compatible)

vLLM offers faster inference and multi-user support. Ideal if multiple developers share one Clore.ai server.

Option C — TabbyML Backend (FIM Autocomplete Specialist)

TabbyML provides superior fill-in-the-middle (FIM) autocomplete — the inline ghost-text suggestions. See the TabbyML guidearrow-up-right for full setup details.

Part 2: Install Continue.dev Extension

VS Code:

  1. Open the Extensions panel (Ctrl+Shift+X / Cmd+Shift+X)

  2. Search "Continue" — install the official extension by Continue (continuedev)

  3. Click the Continue icon in the sidebar (or Ctrl+Shift+I)

JetBrains (IntelliJ, PyCharm, WebStorm, GoLand):

  1. File → Settings → Plugins → Marketplace

  2. Search "Continue" and install

  3. Restart the IDE; the Continue panel appears on the right sidebar

Part 3: Configure Continue.dev to Use Clore.ai

Edit ~/.continue/config.json on your local machine:

For vLLM backend instead of Ollama:

For TabbyML backend (autocomplete only):

Configuration

SSH Tunnel Setup (Secure Remote Access)

Instead of exposing ports publicly, use an SSH tunnel from your local machine:

Persistent Tunnel with autossh

Load Multiple Models for Different Tasks

For an RTX 3090 (24 GB), you can run a large chat model and a small autocomplete model simultaneously:

Codebase Indexing (RAG for Your Repo)

Continue.dev can index your codebase for context-aware suggestions. Pull an embedding model:

GPU Acceleration

Monitor Inference Performance

Expected Performance by GPU

GPU
Model
Context
Tokens/sec (approx.)

RTX 3060 12GB

CodeLlama 7B

8K

~40–60 t/s

RTX 3060 12GB

DeepSeek-Coder 6.7B

8K

~45–65 t/s

RTX 3090 24GB

Qwen2.5-Coder 32B (Q4)

16K

~15–25 t/s

RTX 3090 24GB

DeepSeek-Coder 33B (Q4)

16K

~15–22 t/s

RTX 4090 24GB

Qwen2.5-Coder 32B (Q4)

16K

~25–40 t/s

A100 40GB

Qwen2.5-Coder 32B (FP16)

32K

~35–50 t/s

A100 80GB

CodeLlama 70B (Q4)

32K

~20–30 t/s

For autocomplete (fill-in-the-middle), starcoder2:3b or codellama:7b achieve 50–100 t/s — fast enough to feel instant in the IDE.

Tune Ollama for Better Performance

Tips & Best Practices

Use Different Models for Different Tasks

Configure Continue.dev with specialized models per task type — the UI lets you switch models mid-conversation:

Cost Comparison

Solution
Monthly Cost (8hr/day usage)
Privacy
Model Quality

GitHub Copilot

$19/user/mo

❌ Microsoft cloud

GPT-4o (closed)

Cursor Pro

$20/user/mo

❌ Cursor cloud

Claude 3.5 (closed)

RTX 3060 on Clore.ai

~$24/mo

✅ Your server

CodeLlama 13B

RTX 3090 on Clore.ai

~$48/mo

✅ Your server

Qwen2.5-Coder 32B

RTX 4090 on Clore.ai

~$84/mo

✅ Your server

Qwen2.5-Coder 32B

A100 80GB on Clore.ai

~$264/mo

✅ Your server

CodeLlama 70B

For a team of 3+ developers sharing one Clore.ai RTX 3090 (~$48/mo total), the per-user cost beats Copilot while providing a larger, private model.

Shut Down When Not Coding

Clore.ai charges per hour. Use a simple script to start/stop the server:

Use Continue.dev Custom Commands

Add custom slash commands to config.json for common coding workflows:

Troubleshooting

Problem
Likely Cause
Solution

Continue.dev shows "Connection refused"

Ollama not reachable

Check SSH tunnel is active; verify curl http://localhost:11434/ works

Autocomplete not triggering

Tab autocomplete model not set

Add tabAutocompleteModel to config.json; enable in Continue settings

Very slow responses (>30s first token)

Model loading from disk

First request loads model into VRAM — subsequent requests are fast

"Model not found" error

Model not pulled

Run docker exec ollama ollama pull <model-name> on Clore.ai server

High latency between tokens

Network lag or model too large

Use SSH tunnel; switch to smaller model; check server GPU utilization

Codebase context not working

Embeddings model missing

Pull nomic-embed-text via Ollama; check embeddingsProvider in config.json

SSH tunnel drops frequently

Unstable connection

Use autossh for persistent reconnection; add ServerAliveInterval 30

Context window exceeded

Long files/conversations

Reduce contextLength in config.json; use a model with longer context

JetBrains plugin not loading

IDE version incompatibility

Update JetBrains IDE to latest; check Continue.dev plugin compatibility matrix

vLLM OOM during loading

Not enough VRAM

Add --gpu-memory-utilization 0.85; use smaller model or quantized version

Debug Commands

Continue.dev Config Validation

Further Reading

Last updated

Was this helpful?