Llama 3.2 Vision

Run Meta's multimodal Llama 3.2 Vision models for image understanding on CLORE.AI GPUs.

circle-check

Why Llama 3.2 Vision?

  • Multimodal - Understands both text and images

  • Multiple sizes - 11B and 90B parameter versions

  • Versatile - OCR, visual QA, image captioning, document analysis

  • Open weights - Fully open source from Meta

  • Llama ecosystem - Compatible with Ollama, vLLM, transformers

Model Variants

Model
Parameters
VRAM (FP16)
Context
Best For

Llama-3.2-11B-Vision

11B

24GB

128K

General use, single GPU

Llama-3.2-90B-Vision

90B

180GB

128K

Maximum quality

Llama-3.2-11B-Vision-Instruct

11B

24GB

128K

Chat/assistant

Llama-3.2-90B-Vision-Instruct

90B

180GB

128K

Production

Quick Deploy on CLORE.AI

Docker Image:

Ports:

Command:

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Hardware Requirements

Model
Minimum GPU
Recommended
Optimal

11B Vision

RTX 4090 24GB

A100 40GB

A100 80GB

90B Vision

4x A100 40GB

4x A100 80GB

8x H100

Installation

Using Ollama (Easiest)

Using vLLM

Using Transformers

Basic Usage

Image Understanding

With Ollama

With vLLM API

Use Cases

OCR / Text Extraction

Document Analysis

Visual Question Answering

Image Captioning

Code from Screenshots

Multiple Images

Batch Processing

Gradio Interface

Performance

Task
Model
GPU
Time

Single image description

11B

RTX 4090

~3s

Single image description

11B

A100 40GB

~2s

OCR (1 page)

11B

RTX 4090

~5s

Document analysis

11B

A100 40GB

~8s

Batch (10 images)

11B

A100 40GB

~25s

Quantization

4-bit with bitsandbytes

GGUF with Ollama

Cost Estimate

Typical CLORE.AI marketplace rates:

GPU
Hourly Rate
Best For

RTX 4090 24GB

~$0.10

11B model

A100 40GB

~$0.17

11B with long context

A100 80GB

~$0.25

11B optimal

4x A100 80GB

~$1.00

90B model

Prices vary. Check CLORE.AI Marketplacearrow-up-right for current rates.

Save money:

  • Use Spot orders for batch processing

  • Pay with CLORE tokens

  • Use quantized models (4-bit) for development

Troubleshooting

Out of Memory

Slow Generation

  • Ensure GPU is being used (check nvidia-smi)

  • Use bfloat16 instead of float32

  • Reduce image resolution before processing

  • Use vLLM for better throughput

Image Not Loading

HuggingFace Token Required

Llama Vision vs Others

Feature
Llama 3.2 Vision
LLaVA 1.6
GPT-4V

Parameters

11B / 90B

7B / 34B

Unknown

Open Source

Yes

Yes

No

OCR Quality

Excellent

Good

Excellent

Context

128K

32K

128K

Multi-image

Yes

Limited

Yes

License

Llama 3.2

Apache 2.0

Proprietary

Use Llama 3.2 Vision when:

  • Need open-source multimodal

  • OCR and document analysis

  • Integration with Llama ecosystem

  • Long context understanding

Next Steps

Last updated

Was this helpful?