DeepSeek Coder

Best-in-class code generation with DeepSeek Coder on Clore.ai

Newer versions available! DeepSeek-R1 (reasoning + coding) and DeepSeek-V3 (general purpose) are significantly more capable. Also see Qwen2.5-Coder for a strong coding alternative.

Best-in-class code generation with DeepSeek Coder models.

All examples can be run on GPU servers rented through CLORE.AI Marketplace.

Renting on CLORE.AI

Visit CLORE.AI Marketplace
Filter by GPU type, VRAM, and price
Choose On-Demand (fixed rate) or Spot (bid price)
Configure your order:
- Select Docker image
- Set ports (TCP for SSH, HTTP for web UIs)
- Add environment variables if needed
- Enter startup command
Select payment: CLORE, BTC, or USDT/USDC
Create order and wait for deployment

Access Your Server

Find connection details in My Orders
Web interfaces: Use the HTTP port URL
SSH: ssh -p <port> root@<proxy-address>

What is DeepSeek Coder?

DeepSeek Coder offers:

State-of-the-art code generation
338 programming languages
Fill-in-the-middle support
Repository-level understanding

Model Variants

Model

Parameters

VRAM

Context

DeepSeek-Coder-1.3B

1.3B

3GB

16K

DeepSeek-Coder-6.7B

6.7B

8GB

16K

DeepSeek-Coder-33B

33B

40GB

16K

DeepSeek-Coder-V2

16B/236B

20GB+

128K

Quick Deploy

Docker Image:

pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime

Ports:

22/tcp
8000/http

Command:

pip install vllm && \
vllm serve deepseek-ai/deepseek-coder-6.7b-instruct --port 8000

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

Go to My Orders page
Click on your order
Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Using Ollama


# Run DeepSeek Coder
ollama run deepseek-coder

# Specific sizes
ollama run deepseek-coder:1.3b
ollama run deepseek-coder:6.7b
ollama run deepseek-coder:33b

# V2 (latest)
ollama run deepseek-coder-v2

Installation

pip install transformers accelerate torch

Code Generation

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "deepseek-ai/deepseek-coder-6.7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "user", "content": """
Write a Python class for a REST API client with:
- Authentication support
- Retry logic with exponential backoff
- Request/response logging
"""}
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    inputs,
    max_new_tokens=1024,
    temperature=0.2,
    do_sample=True
)

print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

Fill-in-the-Middle (FIM)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "deepseek-ai/deepseek-coder-6.7b-base"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Fill-in-the-middle format
prefix = """def calculate_statistics(data):
    \"\"\"Calculate mean, median, and std of a list.\"\"\"
    import statistics

    mean = statistics.mean(data)
"""

suffix = """
    return {
        'mean': mean,
        'median': median,
        'std': std
    }
"""

# FIM tokens
prompt = f"<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

DeepSeek-Coder-V2

Latest and most powerful:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "user", "content": "Implement a thread-safe LRU cache in Python"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.2)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

vLLM Server

vllm serve deepseek-ai/deepseek-coder-6.7b-instruct \
    --port 8000 \
    --dtype bfloat16 \
    --max-model-len 16384 \
    --trust-remote-code

API Usage

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")

response = client.chat.completions.create(
    model="deepseek-ai/deepseek-coder-6.7b-instruct",
    messages=[
        {"role": "system", "content": "You are an expert programmer."},
        {"role": "user", "content": "Write a FastAPI websocket server"}
    ],
    temperature=0.2,
    max_tokens=1500
)

print(response.choices[0].message.content)

Code Review

code_to_review = """
def process_data(data):
    result = []
    for i in range(len(data)):
        if data[i] > 0:
            result.append(data[i] * 2)
    return result
"""

messages = [
    {"role": "user", "content": f"""
Review this code and suggest improvements:

```python
{code_to_review}

Focus on:

Performance
Readability
Best practices """} ]


## Bug Fixing

```python
buggy_code = """
def merge_sorted_lists(list1, list2):
    result = []
    i = j = 0
    while i < len(list1) and j < len(list2):
        if list1[i] < list2[j]:
            result.append(list1[i])
            i += 1
        else:
            result.append(list2[j])
    return result
"""

messages = [
    {"role": "user", "content": f"""
Find and fix the bug in this code:

```python
{buggy_code}

"""} ]


## Gradio Interface

```python
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

def generate_code(prompt, temperature, max_tokens):
    messages = [{"role": "user", "content": prompt}]
    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
    outputs = model.generate(inputs, max_new_tokens=max_tokens, temperature=temperature, do_sample=True)
    return tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)

demo = gr.Interface(
    fn=generate_code,
    inputs=[
        gr.Textbox(label="Prompt", lines=5, placeholder="Describe the code you need..."),
        gr.Slider(0.1, 1.0, value=0.2, label="Temperature"),
        gr.Slider(256, 2048, value=1024, step=128, label="Max Tokens")
    ],
    outputs=gr.Code(language="python", label="Generated Code"),
    title="DeepSeek Coder"
)

demo.launch(server_name="0.0.0.0", server_port=7860)

Performance

Model

GPU

Tokens/sec

DeepSeek-1.3B

RTX 3060

~120

DeepSeek-6.7B

RTX 3090

~70

DeepSeek-6.7B

RTX 4090

~100

DeepSeek-33B

A100

~40

DeepSeek-V2-Lite

RTX 4090

~50

Comparison

Model

HumanEval

Code Quality

DeepSeek-Coder-33B

79.3%

Excellent

CodeLlama-34B

53.7%

Good

GPT-3.5-Turbo

72.6%

Good

Troubleshooting

Code completion not working

Ensure correct prompt format with <|fim_prefix|>, <|fim_suffix|>, <|fim_middle|>
Set appropriate max_new_tokens for code generation

Model outputs garbage

Check model is fully downloaded
Verify CUDA is being used: model.device
Try lower temperature (0.2-0.5 for code)

Slow inference

Use vLLM for 5-10x speedup
Enable torch.compile() for transformers
Use quantized model for large variants

Import errors

Install dependencies: pip install transformers accelerate
Update PyTorch to 2.0+

Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

GPU

Hourly Rate

Daily Rate

4-Hour Session

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Prices vary by provider and demand. Check CLORE.AI Marketplace for current rates.

Save money:

Use Spot market for flexible workloads (often 30-50% cheaper)
Pay with CLORE tokens
Compare prices across different providers

Next Steps

DeepSeek-V3 - Latest DeepSeek flagship model
CodeLlama - Alternative code model
Qwen2.5-Coder - Alibaba's code model
vLLM - Production deployment

PreviousMistral & Mixtral NextDeepSeek-V3

Last updated 7 days ago

Was this helpful?

hashtagRenting on CLORE.AI

hashtagAccess Your Server

hashtagWhat is DeepSeek Coder?

hashtagModel Variants

hashtagQuick Deploy

hashtagAccessing Your Service

hashtagUsing Ollama

hashtagInstallation

hashtagCode Generation

hashtagFill-in-the-Middle (FIM)

hashtagDeepSeek-Coder-V2

hashtagvLLM Server

hashtagAPI Usage

hashtagCode Review

hashtagPerformance

hashtagComparison

hashtagTroubleshooting

hashtagCode completion not working

hashtagModel outputs garbage

hashtagSlow inference

hashtagImport errors

hashtagCost Estimate

hashtagNext Steps

Renting on CLORE.AI

Access Your Server

What is DeepSeek Coder?

Model Variants

Quick Deploy

Accessing Your Service

Using Ollama

Installation

Code Generation

Fill-in-the-Middle (FIM)

DeepSeek-Coder-V2

vLLM Server

API Usage

Code Review

Performance

Comparison

Troubleshooting

Code completion not working

Model outputs garbage

Slow inference

Import errors

Cost Estimate

Next Steps