Open WebUI

Beautiful ChatGPT-like interface for running LLMs on CLORE.AI GPUs.

circle-check

Why Open WebUI?

  • ChatGPT-like UI - Familiar, polished interface

  • Multi-model - Switch between models easily

  • RAG built-in - Upload documents for context

  • User management - Multi-user support

  • History - Conversation persistence

  • Ollama integration - Works out of the box

Quick Deploy on CLORE.AI

Docker Image:

ghcr.io/open-webui/open-webui:cuda

Ports:

22/tcp
8080/http

Command:

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Verify It's Working

Response:

circle-exclamation

Installation

All-in-One (Bundled Ollama)

First Setup

  1. Open http://your-server:8080

  2. Create admin account (first user becomes admin)

  3. Go to Settings → Models → Pull a model

  4. Start chatting!

Features

Chat Interface

  • Markdown rendering

  • Code highlighting

  • Image generation (with compatible models)

  • Voice input/output

  • File attachments

Model Management

  • Pull models directly from UI

  • Create custom models

  • Set default model

  • Model-specific settings

RAG (Document Chat)

  1. Click "+" in chat

  2. Upload PDF, TXT, or other documents

  3. Ask questions about the content

User Management

  • Multiple users

  • Role-based access

  • API key management

  • Usage tracking

Configuration

Environment Variables

Key Settings

Variable
Description
Default

OLLAMA_BASE_URL

Ollama API URL

http://localhost:11434

WEBUI_AUTH

Enable authentication

True

WEBUI_NAME

Instance name

Open WebUI

DEFAULT_MODELS

Default model

-

ENABLE_RAG_WEB_SEARCH

Web search in RAG

False

Connect to Remote Ollama

Docker Compose

API Reference

Open WebUI provides several API endpoints:

Endpoint
Method
Description

/health

GET

Health check

/api/version

GET

Get Open WebUI version

/api/config

GET

Get configuration

/ollama/api/tags

GET

List Ollama models (proxied)

/ollama/api/chat

POST

Chat with Ollama (proxied)

Check Health

Response: true

Get Version

Response:

List Models (via Ollama proxy)

circle-info

Most API operations require authentication. Use the web UI to create an account and manage API keys.

Tips

Faster Responses

  1. Use quantized models (Q4_K_M)

  2. Enable streaming in settings

  3. Reduce context length if needed

Better Quality

  1. Use larger models (13B+)

  2. Use Q8 quantization

  3. Adjust temperature in model settings

Save Resources

  1. Set OLLAMA_KEEP_ALIVE=5m

  2. Unload unused models

  3. Use smaller models for testing

GPU Requirements

Same as Ollama.

Open WebUI itself uses minimal resources (~500MB RAM).

Troubleshooting

Can't connect to Ollama

Models not showing

  1. Check Ollama connection in Settings

  2. Refresh model list

  3. Pull models via CLI: ollama pull modelname

Slow performance

  1. Check GPU is being used: nvidia-smi

  2. Try smaller/quantized models

  3. Reduce concurrent users

Cost Estimate

Setup
GPU
Hourly

Basic (7B)

RTX 3060

~$0.03

Standard (13B)

RTX 3090

~$0.06

Advanced (34B)

RTX 4090

~$0.10

Enterprise (70B)

A100

~$0.17

Next Steps

Last updated

Was this helpful?