MLC-LLM
What is MLC-LLM?
Key Capabilities
Why Use MLC-LLM on Clore.ai?
Quick Start on Clore.ai
Step 1: Find a GPU Server
Step 2: Deploy MLC-LLM
Container Port
Purpose
Step 3: Connect via SSH
Installation & Setup
Option A: Use Pre-compiled Models (Fastest)
Option B: Compile Your Own Model
Running the API Server
Start the OpenAI-Compatible Server
Server Startup Output
Available API Endpoints
Endpoint
Method
Description
API Usage Examples
Chat Completions (Python)
Streaming Response
cURL Example
Available Pre-compiled Models
Llama 3 Series
Mistral / Mixtral
Gemma
Phi
Quantization Options
Quantization
Bits
Quality
VRAM (7B)
VRAM (13B)
Multi-GPU Deployment
Web Chat Interface
Performance Tuning
Optimize Batch Size
Monitor GPU Utilization
Benchmark Throughput
Docker Compose Setup
Troubleshooting
Model Download Fails
Out of Memory (OOM)
CUDA Version Mismatch
Server Not Accessible
Clore.ai GPU Recommendations
GPU
VRAM
Clore.ai Price
Best For
Throughput (Llama 3 8B Q4)
Resources
Last updated
Was this helpful?