DeepSeek-R1 Reasoning Model
Run DeepSeek-R1 open-source reasoning model on Clore.ai GPUs
Overview
Key Features
Model Variants
Variant
Parameters
Architecture
FP16 VRAM
Q4 VRAM
Q4 Disk
Choosing a Variant
Use Case
Recommended Variant
GPU on Clore
HuggingFace Repositories
Variant
Repository
Requirements
Component
Minimum (7B Q4)
Recommended (32B Q4)
Ollama Quick Start
Install and run
Example interactive session
Use the OpenAI-compatible API
Python client (via OpenAI SDK)
vLLM Production Setup
Single GPU — 7B / 14B
Multi-GPU — 32B (recommended)
Multi-GPU — 70B
Query the vLLM endpoint
Transformers / Python (with <think> Tag Parsing)
<think> Tag Parsing)Basic generation
Parsing <think> tags
<think> tagsStreaming with <think> state tracking
<think> state trackingDocker Deployment on Clore.ai
Ollama Docker (simplest)
vLLM Docker (production)
Tips for Clore.ai Deployments
Choosing the right GPU
Budget
GPU
Daily Cost
Best Variant
Performance tuning
Context length considerations
Task Complexity
Typical Thinking Length
Total Context Needed
Troubleshooting
Out of memory (OOM)
Model produces no <think> block
<think> blockRepetitive or looping <think> output
<think> outputSlow first token (high TTFT)
Download stalls on Clore instance
Further Reading
Last updated
Was this helpful?