Docker Images

Ready-to-deploy Docker images for AI workloads on CLORE.AI.

circle-check

Quick Deploy Reference

Task
Image
Ports

Chat with AI

ollama/ollama

22, 11434

ChatGPT-like UI

ghcr.io/open-webui/open-webui

22, 8080

Image Generation

universonic/stable-diffusion-webui

22, 7860

Node-based Image Gen

yanwk/comfyui-boot

22, 8188

LLM API Server

vllm/vllm-openai

22, 8000


Language Models

Ollama

Universal LLM runner - easiest way to run any model.

Image: ollama/ollama
Ports: 22/tcp, 11434/http
Command: ollama serve

After deploy:

Environment variables:


Open WebUI

ChatGPT-like interface for Ollama.

Includes Ollama built-in. Access via HTTP port.

Standalone (connect to existing Ollama):


vLLM

High-performance LLM serving with OpenAI-compatible API.

For larger models (multi-GPU):

Environment variables:


Text Generation Inference (TGI)

HuggingFace's production LLM server.

Environment variables:


Image Generation

Stable Diffusion WebUI (AUTOMATIC1111)

Most popular SD interface with extensions.

For low VRAM (8GB or less):

For API access:


ComfyUI

Node-based workflow for advanced users.

Alternative images:

Manual setup command:


Fooocus

Simplified SD interface, Midjourney-like.


FLUX

Latest high-quality image generation.

Use ComfyUI with FLUX nodes:

Or via Diffusers:


Video Generation

Stable Video Diffusion


AnimateDiff

Use with ComfyUI:

Install AnimateDiff nodes via ComfyUI Manager.


Audio & Voice

Whisper (Transcription)

API usage:


Bark (Text-to-Speech)


Stable Audio


Vision Models

LLaVA


Llama 3.2 Vision

Use Ollama:


Development & Training

PyTorch Base

For custom setups and training.

Includes: CUDA 12.1, cuDNN 8, PyTorch 2.1


Jupyter Lab

Interactive notebooks for ML.

Or use PyTorch base with Jupyter:


Kohya Training

For LoRA and model fine-tuning.


Base Images Reference

NVIDIA Official

Image
CUDA
Use Case

nvidia/cuda:12.1.0-devel-ubuntu22.04

12.1

CUDA development

nvidia/cuda:12.1.0-runtime-ubuntu22.04

12.1

CUDA runtime only

nvidia/cuda:11.8.0-devel-ubuntu22.04

11.8

Legacy compatibility

PyTorch Official

Image
PyTorch
CUDA

pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel

2.5

12.4

pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel

2.0

11.7

pytorch/pytorch:1.13.1-cuda11.6-cudnn8-devel

1.13

11.6

HuggingFace

Image
Purpose

huggingface/transformers-pytorch-gpu

Transformers + PyTorch

ghcr.io/huggingface/text-generation-inference

TGI server


Environment Variables

Common Variables

Variable
Description
Example

HUGGING_FACE_HUB_TOKEN

HF API token for gated models

hf_xxx

CUDA_VISIBLE_DEVICES

GPU selection

0,1

TRANSFORMERS_CACHE

Model cache directory

/root/.cache

Ollama Variables

Variable
Description
Default

OLLAMA_HOST

Bind address

127.0.0.1

OLLAMA_MODELS

Models directory

~/.ollama/models

OLLAMA_NUM_PARALLEL

Parallel requests

1

vLLM Variables

Variable
Description

VLLM_ATTENTION_BACKEND

Attention implementation

VLLM_USE_MODELSCOPE

Use ModelScope instead of HF


Port Reference

Port
Protocol
Service

22

TCP

SSH

7860

HTTP

Gradio (SD WebUI, Fooocus)

7865

HTTP

Fooocus alternative

8000

HTTP

vLLM API

8080

HTTP

Open WebUI, TGI

8188

HTTP

ComfyUI

8888

HTTP

Jupyter

9000

HTTP

Whisper API

11434

TCP

Ollama API


Tips

Persistent Storage

Mount volumes to keep data between restarts:

GPU Selection

For multi-GPU systems:

Memory Management

If running out of VRAM:

  1. Use smaller models

  2. Enable CPU offload

  3. Reduce batch size

  4. Use quantized models (GGUF Q4)

Next Steps

Last updated

Was this helpful?