Mistral & Mixtral

Run Mistral and Mixtral models for high-quality text generation.

circle-check

Renting on CLORE.AI

  1. Filter by GPU type, VRAM, and price

  2. Choose On-Demand (fixed rate) or Spot (bid price)

  3. Configure your order:

    • Select Docker image

    • Set ports (TCP for SSH, HTTP for web UIs)

    • Add environment variables if needed

    • Enter startup command

  4. Select payment: CLORE, BTC, or USDT/USDC

  5. Create order and wait for deployment

Access Your Server

  • Find connection details in My Orders

  • Web interfaces: Use the HTTP port URL

  • SSH: ssh -p <port> root@<proxy-address>

Model Overview

Model
Parameters
VRAM
Specialty

Mistral-7B

7B

8GB

General purpose

Mistral-7B-Instruct

7B

8GB

Chat/instruction

Mixtral-8x7B

46.7B (12.9B active)

24GB

MoE, best quality

Mixtral-8x22B

141B

80GB+

Largest MoE

Quick Deploy

Docker Image:

Ports:

Command:

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Installation Options

Using Ollama (Easiest)

Using vLLM

Using Transformers

Mistral-7B with Transformers

Mixtral-8x7B

Quantized Models (Lower VRAM)

4-bit Quantization

GGUF with llama.cpp

vLLM Server (Production)

OpenAI-Compatible API

Streaming

Function Calling

Mistral supports function calling:

Gradio Interface

Performance Comparison

Throughput (tokens/sec)

Model
RTX 3060
RTX 3090
RTX 4090
A100 40GB

Mistral-7B FP16

45

80

120

150

Mistral-7B Q4

70

110

160

200

Mixtral-8x7B FP16

-

-

30

60

Mixtral-8x7B Q4

-

25

50

80

Mixtral-8x22B Q4

-

-

-

25

Time to First Token (TTFT)

Model
RTX 3090
RTX 4090
A100

Mistral-7B

80ms

50ms

35ms

Mixtral-8x7B

-

150ms

90ms

Mixtral-8x22B

-

-

200ms

Context Length vs VRAM (Mistral-7B)

Context
FP16
Q8
Q4

4K

15GB

9GB

5GB

8K

18GB

11GB

7GB

16K

24GB

15GB

9GB

32K

36GB

22GB

14GB

VRAM Requirements

Model
FP16
8-bit
4-bit

Mistral-7B

14GB

8GB

5GB

Mixtral-8x7B

90GB

45GB

24GB

Mixtral-8x22B

180GB

90GB

48GB

Use Cases

Code Generation

Data Analysis

Creative Writing

Troubleshooting

Out of Memory

  • Use 4-bit quantization

  • Use Mistral-7B instead of Mixtral

  • Reduce max_model_len

Slow Generation

  • Use vLLM for production

  • Enable flash attention

  • Use tensor parallelism for multi-GPU

Poor Output Quality

  • Adjust temperature (0.1-0.9)

  • Use instruct variant

  • Better system prompts

Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

GPU
Hourly Rate
Daily Rate
4-Hour Session

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Prices vary by provider and demand. Check CLORE.AI Marketplacearrow-up-right for current rates.

Save money:

  • Use Spot market for flexible workloads (often 30-50% cheaper)

  • Pay with CLORE tokens

  • Compare prices across different providers

Next Steps

Last updated

Was this helpful?