MiMo-V2-Flash
Deploy MiMo-V2-Flash (309B MoE) with speculative decoding on Clore.ai — ultra-fast inference with 150+ tok/s
At a Glance
Why MiMo-V2-Flash?
GPU Recommendations
Setup
VRAM
Performance
Daily Cost*
Deploy with SGLang (Recommended)
Install SGLang
Multi-GPU Setup with MTP
Query with OpenAI API
Deploy with vLLM
Docker Template
Advanced Configuration
Optimizing Speculative Decoding
Memory Optimization
Benchmarking Example
Tips for Clore.ai Users
Troubleshooting
Issue
Solution
Performance Comparison
Model
Size
Speed (8×H100)
Quality
Resources
Last updated
Was this helpful?