# AnythingLLM RAG Platform

## Overview

[AnythingLLM](https://github.com/Mintplex-Labs/anything-llm) is a full-featured, open-source AI workspace with 40K+ GitHub stars. It combines document-based RAG (Retrieval-Augmented Generation), AI agents, and a no-code agent builder into a single, self-hosted application — all managed through a clean, intuitive UI that requires zero coding to set up.

**Why run AnythingLLM on Clore.ai?**

* **Complete RAG pipeline out of the box** — Upload PDFs, Word docs, websites, and YouTube transcripts. AnythingLLM automatically chunks, embeds, and stores them for semantic search.
* **No GPU required for the application** — AnythingLLM uses CPU-based embedding by default. Pair it with a Clore.ai GPU server running Ollama or vLLM for local inference.
* **AI agents with real tools** — Built-in agents can browse the web, write and execute code, manage files, and call external APIs — all orchestrated through a GUI.
* **MCP compatibility** — Integrates with the Model Context Protocol ecosystem for extended tool connectivity.
* **Workspace isolation** — Create separate workspaces with different knowledge bases and LLM settings for different projects or teams.

### Architecture Overview

```
┌─────────────────────────────────────────────┐
│            AnythingLLM (Port 3001)          │
│                                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ RAG/Docs │  │  Agents  │  │  Users   │  │
│  └────┬─────┘  └────┬─────┘  └──────────┘  │
│       │             │                       │
│  ┌────▼─────────────▼───────┐               │
│  │    LLM Provider Router   │               │
│  └──────────────┬───────────┘               │
└─────────────────┼───────────────────────────┘
                  │
     ┌────────────┼────────────┐
     ▼            ▼            ▼
  OpenAI       Anthropic    Ollama (local)
  Claude        Gemini      vLLM (local)
```

***

## Requirements

### Server Specifications

| Component   | Minimum       | Recommended                    | Notes                                    |
| ----------- | ------------- | ------------------------------ | ---------------------------------------- |
| **GPU**     | None required | RTX 3090 (if using local LLMs) | For Ollama/vLLM backend only             |
| **VRAM**    | —             | 24 GB                          | For local model inference                |
| **CPU**     | 2 vCPU        | 4 vCPU                         | Embedding runs on CPU                    |
| **RAM**     | 4 GB          | 8 GB                           | More = larger document index in memory   |
| **Storage** | 10 GB         | 50+ GB                         | Document storage, vector DB, model cache |

### Clore.ai Pricing Reference

| Server Type                     | Approx. Cost    | Use Case                             |
| ------------------------------- | --------------- | ------------------------------------ |
| CPU instance (4 vCPU, 8 GB RAM) | \~$0.05–0.10/hr | AnythingLLM + external API providers |
| RTX 3090 (24 GB VRAM)           | \~$0.20/hr      | AnythingLLM + Ollama local LLMs      |
| RTX 4090 (24 GB VRAM)           | \~$0.35/hr      | AnythingLLM + faster local inference |
| A100 80 GB                      | \~$1.10/hr      | AnythingLLM + large 70B+ models      |

> 💡 **Pro tip:** AnythingLLM's built-in embedding (LanceDB + local CPU embedder) works without GPU. For the LLM backend, you can use free-tier API providers like OpenRouter or Groq to keep costs minimal.

### Prerequisites

* Clore.ai server with SSH access
* Docker (pre-installed on Clore.ai servers)
* At least one LLM API key **or** local Ollama/vLLM backend

***

## Quick Start

### Method 1: Single Docker Container (Recommended)

The official single-container deployment includes everything: the web UI, LanceDB vector store, and document processor.

**Step 1: Connect to your Clore.ai server**

```bash
ssh root@<your-clore-server-ip> -p <ssh-port>
```

**Step 2: Set up storage directory**

```bash
export STORAGE_LOCATION=$HOME/anythingllm
mkdir -p $STORAGE_LOCATION
touch "$STORAGE_LOCATION/.env"
```

**Step 3: Run AnythingLLM**

```bash
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v ${STORAGE_LOCATION}:/app/server/storage \
  -v ${STORAGE_LOCATION}/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm
```

> **Why `--cap-add SYS_ADMIN`?** AnythingLLM uses Chromium for webpage scraping and PDF rendering, which requires elevated container capabilities.

**Step 4: Verify startup**

```bash
docker logs anythingllm --tail 30 -f
# Wait for: "Server listening on port 3001"
```

**Step 5: Complete setup wizard**

Open in browser:

```
http://<your-clore-server-ip>:3001
```

The first-time setup wizard guides you through:

1. Create admin account
2. Choose LLM provider
3. Choose embedding model
4. Configure your first workspace

***

### Method 2: Docker Compose (Multi-Service)

For production deployments with separate services and easier management:

**Step 1: Create project directory**

```bash
mkdir -p ~/anythingllm && cd ~/anythingllm
mkdir -p storage
touch storage/.env
```

**Step 2: Create `docker-compose.yml`**

```bash
cat > docker-compose.yml << 'EOF'
version: '3.8'

services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    container_name: anythingllm
    restart: unless-stopped
    ports:
      - "3001:3001"
    cap_add:
      - SYS_ADMIN
    environment:
      STORAGE_DIR: "/app/server/storage"
      # LLM Provider (configure one)
      LLM_PROVIDER: openai
      OPEN_AI_KEY: ${OPENAI_API_KEY}
      OPEN_MODEL_PREF: gpt-4o-mini
      # Embedding
      EMBEDDING_ENGINE: native
      # Vector DB
      VECTOR_DB: lancedb
      # Auth
      AUTH_TOKEN: ${ANYTHINGLLM_AUTH_TOKEN}
      JWT_SECRET: ${JWT_SECRET}
    volumes:
      - ./storage:/app/server/storage
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3001/api/ping"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  anythingllm_storage:
EOF
```

**Step 3: Create `.env` file**

```bash
cat > .env << 'EOF'
OPENAI_API_KEY=sk-your-openai-key-here
ANYTHINGLLM_AUTH_TOKEN=your-instance-password-here
JWT_SECRET=your-random-64-char-secret-here
EOF
```

**Step 4: Start**

```bash
docker compose up -d
docker compose logs anythingllm -f
```

***

### Method 3: With Pre-configured Environment Variables

For automated deployment without the setup wizard:

```bash
export STORAGE_LOCATION=$HOME/anythingllm
mkdir -p $STORAGE_LOCATION && touch "$STORAGE_LOCATION/.env"

docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v ${STORAGE_LOCATION}:/app/server/storage \
  -v ${STORAGE_LOCATION}/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e LLM_PROVIDER=openai \
  -e OPEN_AI_KEY=sk-your-key \
  -e OPEN_MODEL_PREF=gpt-4o-mini \
  -e EMBEDDING_ENGINE=native \
  -e VECTOR_DB=lancedb \
  -e AUTH_TOKEN=your-password \
  -e JWT_SECRET=$(openssl rand -hex 32) \
  mintplexlabs/anythingllm
```

***

## Configuration

### LLM Provider Options

AnythingLLM supports a wide range of LLM backends. Set in the UI under **Settings → LLM Preference**, or via environment variables:

**OpenAI:**

```bash
-e LLM_PROVIDER=openai
-e OPEN_AI_KEY=sk-your-key
-e OPEN_MODEL_PREF=gpt-4o
```

**Anthropic Claude:**

```bash
-e LLM_PROVIDER=anthropic
-e ANTHROPIC_API_KEY=sk-ant-your-key
-e ANTHROPIC_MODEL_PREF=claude-sonnet-4-5
```

**Google Gemini:**

```bash
-e LLM_PROVIDER=gemini
-e GEMINI_API_KEY=your-key
-e GEMINI_LLM_MODEL_PREF=gemini-1.5-pro
```

**Ollama (local):**

```bash
-e LLM_PROVIDER=ollama
-e OLLAMA_BASE_PATH=http://172.17.0.1:11434
-e OLLAMA_MODEL_PREF=llama3.2
```

**OpenRouter (access 100+ models):**

```bash
-e LLM_PROVIDER=openrouter
-e OPENROUTER_API_KEY=sk-or-your-key
-e OPENROUTER_MODEL_PREF=meta-llama/llama-3.1-8b-instruct:free
```

### Embedding Configuration

| Engine    | Backend        | GPU Needed | Quality        |
| --------- | -------------- | ---------- | -------------- |
| `native`  | CPU (built-in) | No         | Good           |
| `openai`  | OpenAI API     | No         | Excellent      |
| `ollama`  | Local Ollama   | Optional   | Good-Excellent |
| `localai` | LocalAI        | Optional   | Variable       |

```bash
# Use OpenAI embeddings for best quality
-e EMBEDDING_ENGINE=openai
-e OPEN_AI_KEY=sk-your-key
-e EMBEDDING_MODEL_PREF=text-embedding-3-small

# Use Ollama embeddings for fully local pipeline
-e EMBEDDING_ENGINE=ollama
-e OLLAMA_BASE_PATH=http://172.17.0.1:11434
-e EMBEDDING_MODEL_PREF=nomic-embed-text
```

### Vector Database Options

| DB         | Description            | Best For                       |
| ---------- | ---------------------- | ------------------------------ |
| `lancedb`  | Built-in, no config    | Default, small-medium datasets |
| `chroma`   | ChromaDB (external)    | Medium datasets, flexibility   |
| `pinecone` | Pinecone cloud         | Large datasets, production     |
| `weaviate` | Weaviate (self-hosted) | Advanced use cases             |

### Workspace Configuration

AnythingLLM workspaces are isolated environments with their own:

* Document knowledge base
* LLM settings (can override global)
* Chat history
* Agent configurations

Create workspaces via the UI or API:

```bash
# Create workspace via API
curl -X POST http://localhost:3001/api/v1/workspace/new \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"name": "My Project", "similarityThreshold": 0.7}'
```

### Document Ingestion

Upload documents via UI or API:

```bash
# Upload a document via API
curl -X POST http://localhost:3001/api/v1/document/upload \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@/path/to/document.pdf"

# Move document to workspace
curl -X POST http://localhost:3001/api/v1/workspace/my-project/update-embeddings \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"adds": ["custom-documents/document.pdf-chunk-1.json"]}'
```

***

## GPU Acceleration

AnythingLLM itself runs on CPU. GPU acceleration applies to the LLM inference backend.

### Running Ollama on the Same Clore.ai Server

```bash
# Start Ollama with GPU support
docker run -d \
  --name ollama \
  --gpus all \
  --restart unless-stopped \
  -p 11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama

# Pull models for AnythingLLM
docker exec ollama ollama pull llama3.2          # 2B, fast
docker exec ollama ollama pull llama3.1:8b       # 8B, balanced
docker exec ollama ollama pull nomic-embed-text  # for embeddings
docker exec ollama ollama pull mxbai-embed-large # better embeddings

# Restart AnythingLLM with Ollama config
docker stop anythingllm && docker rm anythingllm

docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e LLM_PROVIDER=ollama \
  -e OLLAMA_BASE_PATH=http://172.17.0.1:11434 \
  -e OLLAMA_MODEL_PREF=llama3.1:8b \
  -e EMBEDDING_ENGINE=ollama \
  -e EMBEDDING_MODEL_PREF=nomic-embed-text \
  mintplexlabs/anythingllm
```

### GPU-Model Performance on Clore.ai

| Model         | GPU       | VRAM  | Embedding Speed | Inference Speed | Cost/hr |
| ------------- | --------- | ----- | --------------- | --------------- | ------- |
| Llama 3.2 3B  | RTX 3090  | 2 GB  | Fast            | 60–80 tok/s     | \~$0.20 |
| Llama 3.1 8B  | RTX 3090  | 6 GB  | Fast            | 40–60 tok/s     | \~$0.20 |
| Mistral 7B    | RTX 3090  | 5 GB  | Fast            | 45–65 tok/s     | \~$0.20 |
| Llama 3.1 70B | A100 80GB | 40 GB | Medium          | 20–35 tok/s     | \~$1.10 |

***

## Tips & Best Practices

### Document Ingestion Best Practices

```bash
# For large document sets, increase Node.js memory
-e NODE_OPTIONS="--max-old-space-size=4096"

# Recommended chunk settings for different document types
# Technical docs: chunk size 1000, overlap 200
# Legal/contracts: chunk size 500, overlap 100
# Books/articles: chunk size 1500, overlap 300
```

* **Pre-process large PDFs** — OCR-heavy scans slow ingestion. Use `pdftotext` or Adobe OCR beforehand.
* **Organize by workspace** — Create separate workspaces per project/domain for better retrieval precision.
* **Use specific queries** — RAG works best with specific questions, not broad requests.

### Cost Management on Clore.ai

```bash
# Back up your storage before stopping a Clore.ai instance
tar -czf anythingllm-backup-$(date +%Y%m%d).tar.gz ~/anythingllm/

# To resume on a new Clore.ai instance, restore the backup
tar -xzf anythingllm-backup-20240101.tar.gz -C ~/
```

Since Clore.ai instances are ephemeral, always back up the storage directory. It contains:

* Vector embeddings (LanceDB)
* Uploaded documents
* Chat history
* Agent configurations

### Multi-User Setup

```bash
# Enable multi-user mode in the UI:
# Settings → Security → Enable Multi-User Mode

# Or via environment:
-e MULTI_USER_MODE=true

# Create user via API after enabling multi-user
curl -X POST http://localhost:3001/api/v1/admin/users/new \
  -H "Authorization: Bearer admin-api-key" \
  -H "Content-Type: application/json" \
  -d '{"username": "alice", "password": "securepass", "role": "default"}'
```

### AI Agent Configuration

AnythingLLM agents can perform real-world tasks. Enable tools in **Settings → Agents**:

* **Web Browse** — Fetches and reads web pages
* **Google Search** — Searches Google (requires API key)
* **Code Interpreter** — Executes Python in sandbox
* **GitHub** — Reads repositories
* **SQL Connector** — Queries databases

```bash
# Enable agent capabilities via environment
-e AGENT_SEARCH_PROVIDER=google
-e AGENT_GSX_GOOGLE_SEARCH_ENGINE_ID=your-cx-id
-e AGENT_GSX_GOOGLE_SEARCH_KEY=your-api-key
```

### Performance Tuning

```bash
# For heavy document processing workloads
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  --cpus="4" \
  --memory="8g" \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm
```

### Updating AnythingLLM

```bash
# Pull latest image
docker pull mintplexlabs/anythingllm:latest

# Backup first
cp -r $HOME/anythingllm $HOME/anythingllm-backup-$(date +%Y%m%d)

# Stop and remove old container (data is in volume, safe)
docker stop anythingllm && docker rm anythingllm

# Restart with same command
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm
```

***

## Troubleshooting

### Container starts but UI not accessible

```bash
# Check container is running
docker ps | grep anythingllm

# Check logs for startup errors
docker logs anythingllm --tail 50

# Verify port binding
ss -tlnp | grep 3001

# Check if Clore.ai server has port 3001 in port mapping
# (add it in the Clore.ai deployment settings)
```

### Document upload fails

```bash
# Check available disk space
df -h

# Inspect document processor logs
docker logs anythingllm 2>&1 | grep -i "error\|fail\|upload"

# Verify SYS_ADMIN capability is set (required for Chromium)
docker inspect anythingllm | grep -A5 CapAdd
```

### RAG responses are poor quality / hallucinating

Common causes and fixes:

```bash
# 1. Adjust similarity threshold (lower = more docs retrieved)
# Settings → Workspace → Vector Database → Similarity Threshold: 0.5

# 2. Increase top-K results
# Settings → Workspace → Vector Database → Max Context Snippets: 10

# 3. Improve chunk size (re-ingest documents after changing)
# Settings → Workspace → Text Splitter → Chunk Size: 1000, Overlap: 200

# 4. Switch to better embedding model
-e EMBEDDING_ENGINE=openai
-e EMBEDDING_MODEL_PREF=text-embedding-3-large
```

### Ollama connection fails from AnythingLLM

```bash
# Test from the AnythingLLM container
docker exec anythingllm curl -s http://172.17.0.1:11434/api/tags

# If that fails, find the actual Docker bridge IP
ip route | grep docker
# Use that IP in OLLAMA_BASE_PATH

# Alternative: use host networking
docker run -d \
  --name anythingllm \
  --network host \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e OLLAMA_BASE_PATH=http://localhost:11434 \
  mintplexlabs/anythingllm
```

### Out of memory / container crash

```bash
# Check memory usage
docker stats anythingllm

# Free up memory by reducing LanceDB cache
# Or switch to a Clore.ai instance with more RAM

# Restart with memory limits and swap
docker run -d \
  --name anythingllm \
  --memory=6g \
  --memory-swap=8g \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm
```

***

## Further Reading

* [AnythingLLM Documentation](https://docs.anythingllm.com) — full setup and API reference
* [AnythingLLM GitHub](https://github.com/Mintplex-Labs/anything-llm) — source code, issues, roadmap
* [AnythingLLM Docker Hub](https://hub.docker.com/r/mintplexlabs/anythingllm) — image tags
* [Running Ollama on Clore.ai](https://docs.clore.ai/guides/language-models/ollama) — local LLM backend for AnythingLLM
* [Running vLLM on Clore.ai](https://docs.clore.ai/guides/language-models/vllm) — high-performance inference
* [GPU Comparison Guide](https://docs.clore.ai/guides/getting-started/gpu-comparison) — selecting the right Clore.ai tier
* [MCP Documentation](https://modelcontextprotocol.io) — Model Context Protocol for extending agents
* [AnythingLLM API Reference](https://docs.anythingllm.com/api-reference) — REST API for automation
