Overview

Run large language models (LLMs) on CLORE.AI GPUs for inference and chat applications.

Tool
Use Case
Difficulty

Easiest LLM setup

Beginner

ChatGPT-like interface

Beginner

High-throughput production serving

Medium

Efficient GGUF inference

Easy

Full-featured chat UI

Easy

Fastest EXL2 inference

Medium

OpenAI-compatible API

Medium

Model Guides

Model
Parameters
Best For

671B MoE

Reasoning, code, math

0.5B-72B

Multilingual, code

7B / 8x7B

General purpose

6.7B-33B

Code generation

7B-34B

Code completion

2B-27B

Efficient inference

14B

Small but capable

GPU Recommendations

Model Size
Minimum GPU
Recommended

7B (Q4)

RTX 3060 12GB

RTX 3090

13B (Q4)

RTX 3090 24GB

RTX 4090

34B (Q4)

2x RTX 3090

A100 40GB

70B (Q4)

A100 80GB

2x A100

Quantization Guide

Format
VRAM Usage
Quality
Speed

Q2_K

Lowest

Poor

Fastest

Q4_K_M

Low

Good

Fast

Q5_K_M

Medium

Great

Medium

Q8_0

High

Excellent

Slower

FP16

Highest

Best

Slowest

See Also

Last updated

Was this helpful?