Ollama
Server Requirements
Parameter
Minimum
Recommended
Why Ollama?
Quick Deploy on CLORE.AI
Verify It's Working
Accessing Your Service
Installation
Using Docker (Recommended)
Manual Installation
Running Models
Pull and Run
Popular Models
Model
Size
Use Case
Model Variants
API Usage
Chat Completion
OpenAI-Compatible Endpoint
Streaming
Embeddings
Text Generation (Non-Chat)
Complete API Reference
Model Management
Endpoint
Method
Description
List Models
Show Model Details
Pull Model via API
Delete Model
List Running Models
Get Version
Inference Endpoints
Endpoint
Method
Description
Custom Model Creation
GPU Configuration
Check GPU Usage
Multi-GPU
Memory Management
Custom Models (Modelfile)
Running as Service
Systemd
Performance Tips
Benchmarks
Generation Speed (tokens/sec)
Model
RTX 3060
RTX 3090
RTX 4090
A100 40GB
Time to First Token (ms)
Model
RTX 3090
RTX 4090
A100
Context Length vs VRAM (Q4)
Model
2K ctx
4K ctx
8K ctx
16K ctx
GPU Requirements
Model
Q4 VRAM
Q8 VRAM
Cost Estimate
GPU
CLORE/day
Approx USD/hr
Good For
Troubleshooting
Model won't load
Slow generation
Connection refused
HTTP 502 on http_pub URL
Next Steps
Last updated
Was this helpful?