DeepSeek-V3

Run DeepSeek-V3, the state-of-the-art open-source LLM with exceptional reasoning capabilities on CLORE.AI GPUs.

circle-check

Why DeepSeek-V3?

  • State-of-the-art - Competes with GPT-4 and Claude 3.5

  • 671B MoE - 671B total params, 37B active per token

  • Reasoning - Excellent at math, code, and complex tasks

  • Efficient - MoE architecture reduces compute costs

  • Open source - Fully open weights under permissive license

Quick Deploy on CLORE.AI

Docker Image:

vllm/vllm-openai:latest

Ports:

22/tcp
8000/http

Command (Multi-GPU Required):

python -m vllm.entrypoints.openai.api_server \
    --model deepseek-ai/DeepSeek-V3 \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 8 \
    --trust-remote-code

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Verify It's Working

circle-exclamation

Model Variants

Model
Parameters
Active
VRAM Required

DeepSeek-V3

671B

37B

8x80GB (A100/H100)

DeepSeek-V3-Base

671B

37B

8x80GB

DeepSeek-V2.5

236B

21B

4x80GB

DeepSeek-V2-Lite

16B

2.4B

16GB

DeepSeek-Coder-V2

236B

21B

4x80GB

Hardware Requirements

Full Precision

Model
Minimum
Recommended

DeepSeek-V3

8x A100 80GB

8x H100 80GB

DeepSeek-V2.5

4x A100 80GB

4x H100 80GB

DeepSeek-V2-Lite

RTX 4090 24GB

A100 40GB

Quantized (AWQ/GPTQ)

Model
Quantization
VRAM

DeepSeek-V3

INT4

4x80GB

DeepSeek-V2.5

INT4

2x80GB

DeepSeek-V2-Lite

INT4

8GB

Installation

Using Transformers

Using Ollama

API Usage

OpenAI-Compatible API (vLLM)

Streaming

cURL

DeepSeek-V2-Lite (Single GPU)

For users with limited hardware:

Code Generation

DeepSeek-V3 excels at code:

Math & Reasoning

Multi-GPU Configuration

8x GPU (Full Model)

4x GPU (V2.5)

Performance

Throughput (tokens/sec)

Model
GPUs
Context
Tokens/sec

DeepSeek-V3

8x H100

32K

~80

DeepSeek-V3

8x A100 80GB

32K

~50

DeepSeek-V3 INT4

4x A100 80GB

16K

~35

DeepSeek-V2.5

4x A100 80GB

16K

~70

DeepSeek-V2.5

2x A100 80GB

8K

~45

DeepSeek-V2-Lite

RTX 4090

8K

~40

DeepSeek-V2-Lite

RTX 3090

4K

~25

Time to First Token (TTFT)

Model
Configuration
TTFT

DeepSeek-V3

8x H100

~800ms

DeepSeek-V3

8x A100

~1200ms

DeepSeek-V2.5

4x A100

~500ms

DeepSeek-V2-Lite

RTX 4090

~150ms

Memory Usage

Model
Precision
VRAM Required

DeepSeek-V3

FP16

8x 80GB

DeepSeek-V3

INT4

4x 80GB

DeepSeek-V2.5

FP16

4x 80GB

DeepSeek-V2.5

INT4

2x 80GB

DeepSeek-V2-Lite

FP16

20GB

DeepSeek-V2-Lite

INT4

10GB

Benchmarks

Benchmark
DeepSeek-V3
GPT-4
Claude 3.5

MMLU

87.1%

86.4%

88.7%

HumanEval

82.6%

67.0%

92.0%

MATH

61.6%

52.9%

71.1%

GSM8K

89.3%

92.0%

96.4%

Docker Compose

GPU Requirements Summary

Use Case
Recommended Setup
Cost/Hour

Full DeepSeek-V3

8x A100 80GB

~$2.00

DeepSeek-V2.5

4x A100 80GB

~$1.00

Development/Testing

RTX 4090 (V2-Lite)

~$0.10

Production API

8x H100 80GB

~$3.00

Cost Estimate

Typical CLORE.AI marketplace rates:

GPU Configuration
Hourly Rate
Daily Rate

RTX 4090 24GB

~$0.10

~$2.30

A100 40GB

~$0.17

~$4.00

A100 80GB

~$0.25

~$6.00

4x A100 80GB

~$1.00

~$24.00

8x A100 80GB

~$2.00

~$48.00

Prices vary by provider. Check CLORE.AI Marketplacearrow-up-right for current rates.

Save money:

  • Use Spot market for development (often 30-50% cheaper)

  • Pay with CLORE tokens

  • Use DeepSeek-V2-Lite for testing before scaling up

Troubleshooting

Out of Memory

Model Download Slow

trust_remote_code Error

Multi-GPU Not Working

DeepSeek vs Others

Feature
DeepSeek-V3
Llama 3.1 405B
Mixtral 8x22B

Parameters

671B (37B active)

405B

176B (44B active)

Context

128K

128K

64K

Code

Excellent

Great

Good

Math

Excellent

Good

Good

Min VRAM

8x80GB

8x80GB

2x80GB

Use DeepSeek-V3 when:

  • Best reasoning performance needed

  • Code generation is primary use

  • Math/logic tasks are important

  • Have multi-GPU setup available

Next Steps

Last updated

Was this helpful?