Mistral Large 3 (675B MoE)

Run Mistral Large 3 — a 675B MoE frontier model with 41B active parameters on Clore.ai GPUs

Mistral Large 3 is Mistral AI's most powerful open-weight model, released in December 2025 under the Apache 2.0 license. It's a Mixture-of-Experts (MoE) model with 675B total parameters but only 41B active per token — delivering frontier-class performance at a fraction of the compute of a dense 675B model. With native multimodal support (text + images), a 256K context window, and best-in-class agentic capabilities, it competes directly with GPT-4o and Claude-class models while being fully self-hostable.

HuggingFace: mistralai/Mistral-Large-3-675B-Instruct-2512arrow-up-right Ollama: mistral-large-3:675barrow-up-right License: Apache 2.0

Key Features

  • 675B total / 41B active parameters — MoE efficiency means you get frontier performance without activating every parameter

  • Apache 2.0 license — fully open for commercial and personal use, no restrictions

  • Natively multimodal — understands both text and images via a 2.5B vision encoder

  • 256K context window — handles massive documents, codebases, and long conversations

  • Best-in-class agentic capabilities — native function calling, JSON mode, tool use

  • Multiple deployment options — FP8 on H200/B200, NVFP4 on H100/A100, GGUF quantized for consumer GPUs

Model Architecture

Component
Details

Architecture

Granular Mixture-of-Experts (MoE)

Total Parameters

675B

Active Parameters

41B (per token)

Vision Encoder

2.5B parameters

Context Window

256K tokens

Training

3,000× H200 GPUs

Release

December 2025

Requirements

Configuration
Budget (Q4 GGUF)
Standard (NVFP4)
Full (FP8)

GPU

4× RTX 4090

8× A100 80GB

8× H100/H200

VRAM

4×24GB (96GB)

8×80GB (640GB)

8×80GB (640GB)

RAM

128GB

256GB

256GB

Disk

400GB

700GB

1.4TB

CUDA

12.0+

12.0+

12.0+

Recommended Clore.ai setup:

  • Best value: 4× RTX 4090 (~$2–8/day) — run Q4 GGUF quantization via llama.cpp or Ollama

  • Production quality: 8× A100 80GB (~$16–32/day) — NVFP4 with full context via vLLM

  • Maximum performance: 8× H100 (~$24–48/day) — FP8, full 256K context

Quick Start with Ollama

The fastest way to run Mistral Large 3 on a multi-GPU Clore.ai instance:

Quick Start with vLLM (Production)

For production-grade serving with OpenAI-compatible API:

Usage Examples

1. Chat Completion (OpenAI-Compatible API)

Once vLLM is running, use any OpenAI-compatible client:

2. Function Calling / Tool Use

Mistral Large 3 excels at structured tool calling:

3. Vision — Image Analysis

Mistral Large 3 natively understands images:

Tips for Clore.ai Users

  1. Start with NVFP4 on A100s — The Mistral-Large-3-675B-Instruct-2512-NVFP4 checkpoint is specifically designed for A100/H100 nodes and offers near-lossless quality at half the memory footprint of FP8.

  2. Use Ollama for quick experiments — If you have a 4× RTX 4090 instance, Ollama handles GGUF quantization automatically. Perfect for testing before committing to a vLLM production setup.

  3. Expose the API securely — When running vLLM on a Clore.ai instance, use SSH tunneling (ssh -L 8000:localhost:8000 root@<ip>) rather than exposing port 8000 directly.

  4. Lower max-model-len to save VRAM — If you don't need the full 256K context, set --max-model-len 32768 or 65536 to significantly reduce KV-cache memory usage.

  5. Consider the dense alternatives — For single-GPU setups, Mistral 3 14B (mistral3:14b in Ollama) delivers excellent performance on a single RTX 4090 and is from the same model family.

Troubleshooting

Issue
Solution

CUDA out of memory on vLLM

Reduce --max-model-len (try 32768), increase --tensor-parallel-size, or use NVFP4 checkpoint

Slow generation speed

Ensure --tensor-parallel-size matches your GPU count; enable speculative decoding with Eagle checkpoint

Ollama fails to load 675B

Ensure you have 96GB+ VRAM across GPUs; Ollama needs OLLAMA_NUM_PARALLEL=1 for large models

tokenizer_mode mistral errors

You must pass all three flags: --tokenizer-mode mistral --config-format mistral --load-format mistral

Vision not working

Ensure images are close to 1:1 aspect ratio; avoid very wide/thin images for best results

Download too slow

Use huggingface-cli download mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4 with HF_TOKEN set

Further Reading

Last updated

Was this helpful?