PowerInfer
What is PowerInfer?
Key Capabilities
Why Use PowerInfer on Clore.ai?
Hardware Requirements
Model Size
Min VRAM
Recommended RAM
Performance
Quick Start on Clore.ai
Step 1: Choose Your Server
Step 2: Create Custom Docker Image
Step 3: Deploy on Clore.ai
Building PowerInfer from Source
Verify Build
Getting Models
Download GGUF Models
Generate Neuron Predictor (Required for PowerInfer)
Running Inference
Basic Inference (No Predictor)
PowerInfer Mode (With Predictor)
Interactive Chat Mode
Server Mode (OpenAI-compatible API)
Optimizing GPU Layer Split
GPU VRAM
7B Model
13B Model
34B Model
70B Model
Performance Benchmarks
Throughput Comparison (Llama 2 70B, RTX 3090)
Engine
GPU Layers
Tokens/sec
Running as a Service
API Usage
Troubleshooting
CUDA Out of Memory
Slow CPU Inference
Build Fails
Clore.ai GPU Recommendations
GPU
VRAM
Clore.ai Price
Max Model (Q4)
Throughput (Llama 2 70B Q4)
Resources
Last updated
Was this helpful?