AnythingLLM RAG Platform

Deploy AnythingLLM on Clore.ai — an all-in-one RAG application and AI agent platform with built-in document chat, no-code agent builder, and MCP support running on cost-effective GPU cloud servers.

Overview

AnythingLLM is a full-featured, open-source AI workspace with 40K+ GitHub stars. It combines document-based RAG (Retrieval-Augmented Generation), AI agents, and a no-code agent builder into a single, self-hosted application — all managed through a clean, intuitive UI that requires zero coding to set up.

Why run AnythingLLM on Clore.ai?

Complete RAG pipeline out of the box — Upload PDFs, Word docs, websites, and YouTube transcripts. AnythingLLM automatically chunks, embeds, and stores them for semantic search.
No GPU required for the application — AnythingLLM uses CPU-based embedding by default. Pair it with a Clore.ai GPU server running Ollama or vLLM for local inference.
AI agents with real tools — Built-in agents can browse the web, write and execute code, manage files, and call external APIs — all orchestrated through a GUI.
MCP compatibility — Integrates with the Model Context Protocol ecosystem for extended tool connectivity.
Workspace isolation — Create separate workspaces with different knowledge bases and LLM settings for different projects or teams.

Architecture Overview

┌─────────────────────────────────────────────┐
│            AnythingLLM (Port 3001)          │
│                                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ RAG/Docs │  │  Agents  │  │  Users   │  │
│  └────┬─────┘  └────┬─────┘  └──────────┘  │
│       │             │                       │
│  ┌────▼─────────────▼───────┐               │
│  │    LLM Provider Router   │               │
│  └──────────────┬───────────┘               │
└─────────────────┼───────────────────────────┘
                  │
     ┌────────────┼────────────┐
     ▼            ▼            ▼
  OpenAI       Anthropic    Ollama (local)
  Claude        Gemini      vLLM (local)

Requirements

Server Specifications

Component

Minimum

Recommended

Notes

GPU

None required

RTX 3090 (if using local LLMs)

For Ollama/vLLM backend only

VRAM

—

24 GB

For local model inference

CPU

2 vCPU

4 vCPU

Embedding runs on CPU

RAM

4 GB

8 GB

More = larger document index in memory

Storage

10 GB

50+ GB

Document storage, vector DB, model cache

Clore.ai Pricing Reference

Server Type

Approx. Cost

Use Case

CPU instance (4 vCPU, 8 GB RAM)

~$0.05–0.10/hr

AnythingLLM + external API providers

RTX 3090 (24 GB VRAM)

~$0.20/hr

AnythingLLM + Ollama local LLMs

RTX 4090 (24 GB VRAM)

~$0.35/hr

AnythingLLM + faster local inference

A100 80 GB

~$1.10/hr

AnythingLLM + large 70B+ models

💡 Pro tip: AnythingLLM's built-in embedding (LanceDB + local CPU embedder) works without GPU. For the LLM backend, you can use free-tier API providers like OpenRouter or Groq to keep costs minimal.

Prerequisites

Clore.ai server with SSH access
Docker (pre-installed on Clore.ai servers)
At least one LLM API key or local Ollama/vLLM backend

Quick Start

Method 1: Single Docker Container (Recommended)

The official single-container deployment includes everything: the web UI, LanceDB vector store, and document processor.

Step 1: Connect to your Clore.ai server

ssh root@<your-clore-server-ip> -p <ssh-port>

Step 2: Set up storage directory

export STORAGE_LOCATION=$HOME/anythingllm
mkdir -p $STORAGE_LOCATION
touch "$STORAGE_LOCATION/.env"

Step 3: Run AnythingLLM

docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v ${STORAGE_LOCATION}:/app/server/storage \
  -v ${STORAGE_LOCATION}/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm

Why --cap-add SYS_ADMIN? AnythingLLM uses Chromium for webpage scraping and PDF rendering, which requires elevated container capabilities.

Step 4: Verify startup

docker logs anythingllm --tail 30 -f
# Wait for: "Server listening on port 3001"

Step 5: Complete setup wizard

Open in browser:

http://<your-clore-server-ip>:3001

The first-time setup wizard guides you through:

Create admin account
Choose LLM provider
Choose embedding model
Configure your first workspace

Method 2: Docker Compose (Multi-Service)

For production deployments with separate services and easier management:

Step 1: Create project directory

mkdir -p ~/anythingllm && cd ~/anythingllm
mkdir -p storage
touch storage/.env

Step 2: Create docker-compose.yml

cat > docker-compose.yml << 'EOF'
version: '3.8'

services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    container_name: anythingllm
    restart: unless-stopped
    ports:
      - "3001:3001"
    cap_add:
      - SYS_ADMIN
    environment:
      STORAGE_DIR: "/app/server/storage"
      # LLM Provider (configure one)
      LLM_PROVIDER: openai
      OPEN_AI_KEY: ${OPENAI_API_KEY}
      OPEN_MODEL_PREF: gpt-4o-mini
      # Embedding
      EMBEDDING_ENGINE: native
      # Vector DB
      VECTOR_DB: lancedb
      # Auth
      AUTH_TOKEN: ${ANYTHINGLLM_AUTH_TOKEN}
      JWT_SECRET: ${JWT_SECRET}
    volumes:
      - ./storage:/app/server/storage
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3001/api/ping"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  anythingllm_storage:
EOF

Step 3: Create .env file

cat > .env << 'EOF'
OPENAI_API_KEY=sk-your-openai-key-here
ANYTHINGLLM_AUTH_TOKEN=your-instance-password-here
JWT_SECRET=your-random-64-char-secret-here
EOF

Step 4: Start

docker compose up -d
docker compose logs anythingllm -f

Method 3: With Pre-configured Environment Variables

For automated deployment without the setup wizard:

export STORAGE_LOCATION=$HOME/anythingllm
mkdir -p $STORAGE_LOCATION && touch "$STORAGE_LOCATION/.env"

docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v ${STORAGE_LOCATION}:/app/server/storage \
  -v ${STORAGE_LOCATION}/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e LLM_PROVIDER=openai \
  -e OPEN_AI_KEY=sk-your-key \
  -e OPEN_MODEL_PREF=gpt-4o-mini \
  -e EMBEDDING_ENGINE=native \
  -e VECTOR_DB=lancedb \
  -e AUTH_TOKEN=your-password \
  -e JWT_SECRET=$(openssl rand -hex 32) \
  mintplexlabs/anythingllm

Configuration

LLM Provider Options

AnythingLLM supports a wide range of LLM backends. Set in the UI under Settings → LLM Preference, or via environment variables:

OpenAI:

-e LLM_PROVIDER=openai
-e OPEN_AI_KEY=sk-your-key
-e OPEN_MODEL_PREF=gpt-4o

Anthropic Claude:

-e LLM_PROVIDER=anthropic
-e ANTHROPIC_API_KEY=sk-ant-your-key
-e ANTHROPIC_MODEL_PREF=claude-sonnet-4-5

Google Gemini:

-e LLM_PROVIDER=gemini
-e GEMINI_API_KEY=your-key
-e GEMINI_LLM_MODEL_PREF=gemini-1.5-pro

Ollama (local):

-e LLM_PROVIDER=ollama
-e OLLAMA_BASE_PATH=http://172.17.0.1:11434
-e OLLAMA_MODEL_PREF=llama3.2

OpenRouter (access 100+ models):

-e LLM_PROVIDER=openrouter
-e OPENROUTER_API_KEY=sk-or-your-key
-e OPENROUTER_MODEL_PREF=meta-llama/llama-3.1-8b-instruct:free

Embedding Configuration

Engine

Backend

GPU Needed

Quality

native

CPU (built-in)

Good

openai

OpenAI API

Excellent

ollama

Local Ollama

Optional

Good-Excellent

localai

LocalAI

Optional

Variable

# Use OpenAI embeddings for best quality
-e EMBEDDING_ENGINE=openai
-e OPEN_AI_KEY=sk-your-key
-e EMBEDDING_MODEL_PREF=text-embedding-3-small

# Use Ollama embeddings for fully local pipeline
-e EMBEDDING_ENGINE=ollama
-e OLLAMA_BASE_PATH=http://172.17.0.1:11434
-e EMBEDDING_MODEL_PREF=nomic-embed-text

Vector Database Options

Description

Best For

lancedb

Built-in, no config

Default, small-medium datasets

chroma

ChromaDB (external)

Medium datasets, flexibility

pinecone

Pinecone cloud

Large datasets, production

weaviate

Weaviate (self-hosted)

Advanced use cases

Workspace Configuration

AnythingLLM workspaces are isolated environments with their own:

Document knowledge base
LLM settings (can override global)
Chat history
Agent configurations

Create workspaces via the UI or API:

# Create workspace via API
curl -X POST http://localhost:3001/api/v1/workspace/new \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"name": "My Project", "similarityThreshold": 0.7}'

Document Ingestion

Upload documents via UI or API:

# Upload a document via API
curl -X POST http://localhost:3001/api/v1/document/upload \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@/path/to/document.pdf"

# Move document to workspace
curl -X POST http://localhost:3001/api/v1/workspace/my-project/update-embeddings \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"adds": ["custom-documents/document.pdf-chunk-1.json"]}'

GPU Acceleration

AnythingLLM itself runs on CPU. GPU acceleration applies to the LLM inference backend.

Running Ollama on the Same Clore.ai Server

# Start Ollama with GPU support
docker run -d \
  --name ollama \
  --gpus all \
  --restart unless-stopped \
  -p 11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama

# Pull models for AnythingLLM
docker exec ollama ollama pull llama3.2          # 2B, fast
docker exec ollama ollama pull llama3.1:8b       # 8B, balanced
docker exec ollama ollama pull nomic-embed-text  # for embeddings
docker exec ollama ollama pull mxbai-embed-large # better embeddings

# Restart AnythingLLM with Ollama config
docker stop anythingllm && docker rm anythingllm

docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e LLM_PROVIDER=ollama \
  -e OLLAMA_BASE_PATH=http://172.17.0.1:11434 \
  -e OLLAMA_MODEL_PREF=llama3.1:8b \
  -e EMBEDDING_ENGINE=ollama \
  -e EMBEDDING_MODEL_PREF=nomic-embed-text \
  mintplexlabs/anythingllm

GPU-Model Performance on Clore.ai

Model

GPU

VRAM

Embedding Speed

Inference Speed

Cost/hr

Llama 3.2 3B

RTX 3090

2 GB

Fast

60–80 tok/s

~$0.20

Llama 3.1 8B

RTX 3090

6 GB

Fast

40–60 tok/s

~$0.20

Mistral 7B

RTX 3090

5 GB

Fast

45–65 tok/s

~$0.20

Llama 3.1 70B

A100 80GB

40 GB

Medium

20–35 tok/s

~$1.10

Tips & Best Practices

Document Ingestion Best Practices

# For large document sets, increase Node.js memory
-e NODE_OPTIONS="--max-old-space-size=4096"

# Recommended chunk settings for different document types
# Technical docs: chunk size 1000, overlap 200
# Legal/contracts: chunk size 500, overlap 100
# Books/articles: chunk size 1500, overlap 300

Pre-process large PDFs — OCR-heavy scans slow ingestion. Use pdftotext or Adobe OCR beforehand.
Organize by workspace — Create separate workspaces per project/domain for better retrieval precision.
Use specific queries — RAG works best with specific questions, not broad requests.

Cost Management on Clore.ai

# Back up your storage before stopping a Clore.ai instance
tar -czf anythingllm-backup-$(date +%Y%m%d).tar.gz ~/anythingllm/

# To resume on a new Clore.ai instance, restore the backup
tar -xzf anythingllm-backup-20240101.tar.gz -C ~/

Since Clore.ai instances are ephemeral, always back up the storage directory. It contains:

Vector embeddings (LanceDB)
Uploaded documents
Chat history
Agent configurations

Multi-User Setup

# Enable multi-user mode in the UI:
# Settings → Security → Enable Multi-User Mode

# Or via environment:
-e MULTI_USER_MODE=true

# Create user via API after enabling multi-user
curl -X POST http://localhost:3001/api/v1/admin/users/new \
  -H "Authorization: Bearer admin-api-key" \
  -H "Content-Type: application/json" \
  -d '{"username": "alice", "password": "securepass", "role": "default"}'

AI Agent Configuration

AnythingLLM agents can perform real-world tasks. Enable tools in Settings → Agents:

Web Browse — Fetches and reads web pages
Google Search — Searches Google (requires API key)
Code Interpreter — Executes Python in sandbox
GitHub — Reads repositories
SQL Connector — Queries databases

# Enable agent capabilities via environment
-e AGENT_SEARCH_PROVIDER=google
-e AGENT_GSX_GOOGLE_SEARCH_ENGINE_ID=your-cx-id
-e AGENT_GSX_GOOGLE_SEARCH_KEY=your-api-key

Performance Tuning

# For heavy document processing workloads
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  --cpus="4" \
  --memory="8g" \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm

Updating AnythingLLM

# Pull latest image
docker pull mintplexlabs/anythingllm:latest

# Backup first
cp -r $HOME/anythingllm $HOME/anythingllm-backup-$(date +%Y%m%d)

# Stop and remove old container (data is in volume, safe)
docker stop anythingllm && docker rm anythingllm

# Restart with same command
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm

Troubleshooting

Container starts but UI not accessible

# Check container is running
docker ps | grep anythingllm

# Check logs for startup errors
docker logs anythingllm --tail 50

# Verify port binding
ss -tlnp | grep 3001

# Check if Clore.ai server has port 3001 in port mapping
# (add it in the Clore.ai deployment settings)

Document upload fails

# Check available disk space
df -h

# Inspect document processor logs
docker logs anythingllm 2>&1 | grep -i "error\|fail\|upload"

# Verify SYS_ADMIN capability is set (required for Chromium)
docker inspect anythingllm | grep -A5 CapAdd

RAG responses are poor quality / hallucinating

Common causes and fixes:

# 1. Adjust similarity threshold (lower = more docs retrieved)
# Settings → Workspace → Vector Database → Similarity Threshold: 0.5

# 2. Increase top-K results
# Settings → Workspace → Vector Database → Max Context Snippets: 10

# 3. Improve chunk size (re-ingest documents after changing)
# Settings → Workspace → Text Splitter → Chunk Size: 1000, Overlap: 200

# 4. Switch to better embedding model
-e EMBEDDING_ENGINE=openai
-e EMBEDDING_MODEL_PREF=text-embedding-3-large

Ollama connection fails from AnythingLLM

# Test from the AnythingLLM container
docker exec anythingllm curl -s http://172.17.0.1:11434/api/tags

# If that fails, find the actual Docker bridge IP
ip route | grep docker
# Use that IP in OLLAMA_BASE_PATH

# Alternative: use host networking
docker run -d \
  --name anythingllm \
  --network host \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e OLLAMA_BASE_PATH=http://localhost:11434 \
  mintplexlabs/anythingllm

Out of memory / container crash

# Check memory usage
docker stats anythingllm

# Free up memory by reducing LanceDB cache
# Or switch to a Clore.ai instance with more RAM

# Restart with memory limits and swap
docker run -d \
  --name anythingllm \
  --memory=6g \
  --memory-swap=8g \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm

hashtagOverview

hashtagArchitecture Overview

hashtagRequirements

hashtagServer Specifications

hashtagClore.ai Pricing Reference

hashtagPrerequisites

hashtagQuick Start

hashtagMethod 1: Single Docker Container (Recommended)

hashtagMethod 2: Docker Compose (Multi-Service)

hashtagMethod 3: With Pre-configured Environment Variables

hashtagConfiguration

hashtagLLM Provider Options

hashtagEmbedding Configuration

hashtagVector Database Options

hashtagWorkspace Configuration

hashtagDocument Ingestion

hashtagGPU Acceleration

hashtagRunning Ollama on the Same Clore.ai Server

hashtagGPU-Model Performance on Clore.ai

hashtagTips & Best Practices

hashtagDocument Ingestion Best Practices

hashtagCost Management on Clore.ai

hashtagMulti-User Setup

hashtagAI Agent Configuration

hashtagPerformance Tuning

hashtagUpdating AnythingLLM

hashtagTroubleshooting

hashtagContainer starts but UI not accessible

hashtagDocument upload fails

hashtagRAG responses are poor quality / hallucinating

hashtagOllama connection fails from AnythingLLM

hashtagOut of memory / container crash

hashtagFurther Reading

Overview

Architecture Overview

Requirements

Server Specifications

Clore.ai Pricing Reference

Prerequisites

Quick Start

Method 1: Single Docker Container (Recommended)

Method 2: Docker Compose (Multi-Service)

Method 3: With Pre-configured Environment Variables

Configuration

LLM Provider Options

Embedding Configuration

Vector Database Options

Workspace Configuration

Document Ingestion

GPU Acceleration

Running Ollama on the Same Clore.ai Server

GPU-Model Performance on Clore.ai

Tips & Best Practices

Document Ingestion Best Practices

Cost Management on Clore.ai

Multi-User Setup

AI Agent Configuration

Performance Tuning

Updating AnythingLLM

Troubleshooting

Container starts but UI not accessible

Document upload fails

RAG responses are poor quality / hallucinating

Ollama connection fails from AnythingLLM

Out of memory / container crash

Further Reading