# Open WebUI

Beautiful ChatGPT-like interface for running LLMs on CLORE.AI GPUs.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Why Open WebUI?

* **ChatGPT-like UI** - Familiar, polished interface
* **Multi-model** - Switch between models easily
* **RAG built-in** - Upload documents for context
* **User management** - Multi-user support
* **History** - Conversation persistence
* **Ollama integration** - Works out of the box

## Quick Deploy on CLORE.AI

**Docker Image:**

```
ghcr.io/open-webui/open-webui:cuda
```

**Ports:**

```
22/tcp
8080/http
```

**Command:**

```bash
# Start Ollama in background
ollama serve &
sleep 5
ollama pull llama3.2

# Start Open WebUI (connects to Ollama automatically)
# Note: The Docker image handles this
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

### Verify It's Working

```bash
# Check health
curl https://your-http-pub.clorecloud.net/health

# Get version
curl https://your-http-pub.clorecloud.net/api/version
```

Response:

```json
{"version": "0.7.2"}
```

{% hint style="warning" %}
If you get HTTP 502, wait 1-2 minutes - the service is still starting.
{% endhint %}

## Installation

### With Ollama (Recommended)

```bash
# Start Ollama first
docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Pull a model
docker exec -it ollama ollama pull llama3.2

# Start Open WebUI
docker run -d -p 8080:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main
```

### All-in-One (Bundled Ollama)

```bash
docker run -d -p 8080:8080 \
  --gpus all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:ollama
```

## First Setup

1. Open `http://your-server:8080`
2. Create admin account (first user becomes admin)
3. Go to Settings → Models → Pull a model
4. Start chatting!

## Features

### Chat Interface

* Markdown rendering
* Code highlighting
* Image generation (with compatible models)
* Voice input/output
* File attachments

### Model Management

* Pull models directly from UI
* Create custom models
* Set default model
* Model-specific settings

### RAG (Document Chat)

1. Click "+" in chat
2. Upload PDF, TXT, or other documents
3. Ask questions about the content

### User Management

* Multiple users
* Role-based access
* API key management
* Usage tracking

## Configuration

### Environment Variables

```bash
docker run -d \
  -e OLLAMA_BASE_URL=http://ollama:11434 \
  -e WEBUI_AUTH=True \
  -e WEBUI_NAME="My AI Chat" \
  -e DEFAULT_MODELS="llama3.2" \
  ghcr.io/open-webui/open-webui:main
```

### Key Settings

| Variable                | Description           | Default                  |
| ----------------------- | --------------------- | ------------------------ |
| `OLLAMA_BASE_URL`       | Ollama API URL        | `http://localhost:11434` |
| `WEBUI_AUTH`            | Enable authentication | `True`                   |
| `WEBUI_NAME`            | Instance name         | `Open WebUI`             |
| `DEFAULT_MODELS`        | Default model         | -                        |
| `ENABLE_RAG_WEB_SEARCH` | Web search in RAG     | `False`                  |

### Connect to Remote Ollama

```bash
docker run -d -p 8080:8080 \
  -e OLLAMA_BASE_URL=http://remote-server:11434 \
  ghcr.io/open-webui/open-webui:main
```

## Docker Compose

```yaml
version: '3.8'

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    ports:
      - "8080:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

volumes:
  ollama:
  open-webui:
```

```bash
docker-compose up -d
```

## API Reference

Open WebUI provides several API endpoints:

| Endpoint           | Method | Description                  |
| ------------------ | ------ | ---------------------------- |
| `/health`          | GET    | Health check                 |
| `/api/version`     | GET    | Get Open WebUI version       |
| `/api/config`      | GET    | Get configuration            |
| `/ollama/api/tags` | GET    | List Ollama models (proxied) |
| `/ollama/api/chat` | POST   | Chat with Ollama (proxied)   |

### Check Health

```bash
curl https://your-http-pub.clorecloud.net/health
```

Response: `true`

### Get Version

```bash
curl https://your-http-pub.clorecloud.net/api/version
```

Response:

```json
{"version": "0.7.2"}
```

### List Models (via Ollama proxy)

```bash
curl https://your-http-pub.clorecloud.net/ollama/api/tags
```

{% hint style="info" %}
Most API operations require authentication. Use the web UI to create an account and manage API keys.
{% endhint %}

## Tips

### Faster Responses

1. Use quantized models (Q4\_K\_M)
2. Enable streaming in settings
3. Reduce context length if needed

### Better Quality

1. Use larger models (13B+)
2. Use Q8 quantization
3. Adjust temperature in model settings

### Save Resources

1. Set `OLLAMA_KEEP_ALIVE=5m`
2. Unload unused models
3. Use smaller models for testing

## GPU Requirements

Same as [Ollama](/guides/language-models/ollama.md#gpu-requirements).

Open WebUI itself uses minimal resources (\~500MB RAM).

## Troubleshooting

### Can't connect to Ollama

```bash
# Check Ollama is running
curl http://localhost:11434/api/tags

# If using Docker, use host networking or correct URL
docker run --network=host ghcr.io/open-webui/open-webui:main
```

### Models not showing

1. Check Ollama connection in Settings
2. Refresh model list
3. Pull models via CLI: `ollama pull modelname`

### Slow performance

1. Check GPU is being used: `nvidia-smi`
2. Try smaller/quantized models
3. Reduce concurrent users

## Cost Estimate

| Setup            | GPU      | Hourly  |
| ---------------- | -------- | ------- |
| Basic (7B)       | RTX 3060 | \~$0.03 |
| Standard (13B)   | RTX 3090 | \~$0.06 |
| Advanced (34B)   | RTX 4090 | \~$0.10 |
| Enterprise (70B) | A100     | \~$0.17 |

## Next Steps

* [Ollama](/guides/language-models/ollama.md) - CLI usage
* [LocalAI](/guides/language-models/localai-openai-compatible.md) - More backends
* [RAG + LangChain](/guides/training/finetune-llm.md) - Advanced RAG


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/language-models/open-webui.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
