# AnythingLLM RAG Platform

## Overview

[AnythingLLM](https://github.com/Mintplex-Labs/anything-llm) is a full-featured, open-source AI workspace with 40K+ GitHub stars. It combines document-based RAG (Retrieval-Augmented Generation), AI agents, and a no-code agent builder into a single, self-hosted application — all managed through a clean, intuitive UI that requires zero coding to set up.

**Why run AnythingLLM on Clore.ai?**

* **Complete RAG pipeline out of the box** — Upload PDFs, Word docs, websites, and YouTube transcripts. AnythingLLM automatically chunks, embeds, and stores them for semantic search.
* **No GPU required for the application** — AnythingLLM uses CPU-based embedding by default. Pair it with a Clore.ai GPU server running Ollama or vLLM for local inference.
* **AI agents with real tools** — Built-in agents can browse the web, write and execute code, manage files, and call external APIs — all orchestrated through a GUI.
* **MCP compatibility** — Integrates with the Model Context Protocol ecosystem for extended tool connectivity.
* **Workspace isolation** — Create separate workspaces with different knowledge bases and LLM settings for different projects or teams.

### Architecture Overview

```
┌─────────────────────────────────────────────┐
│            AnythingLLM (Port 3001)          │
│                                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ RAG/Docs │  │  Agents  │  │  Users   │  │
│  └────┬─────┘  └────┬─────┘  └──────────┘  │
│       │             │                       │
│  ┌────▼─────────────▼───────┐               │
│  │    LLM Provider Router   │               │
│  └──────────────┬───────────┘               │
└─────────────────┼───────────────────────────┘
                  │
     ┌────────────┼────────────┐
     ▼            ▼            ▼
  OpenAI       Anthropic    Ollama (local)
  Claude        Gemini      vLLM (local)
```

***

## Requirements

### Server Specifications

| Component   | Minimum       | Recommended                    | Notes                                    |
| ----------- | ------------- | ------------------------------ | ---------------------------------------- |
| **GPU**     | None required | RTX 3090 (if using local LLMs) | For Ollama/vLLM backend only             |
| **VRAM**    | —             | 24 GB                          | For local model inference                |
| **CPU**     | 2 vCPU        | 4 vCPU                         | Embedding runs on CPU                    |
| **RAM**     | 4 GB          | 8 GB                           | More = larger document index in memory   |
| **Storage** | 10 GB         | 50+ GB                         | Document storage, vector DB, model cache |

### Clore.ai Pricing Reference

| Server Type                     | Approx. Cost    | Use Case                             |
| ------------------------------- | --------------- | ------------------------------------ |
| CPU instance (4 vCPU, 8 GB RAM) | \~$0.05–0.10/hr | AnythingLLM + external API providers |
| RTX 3090 (24 GB VRAM)           | \~$0.20/hr      | AnythingLLM + Ollama local LLMs      |
| RTX 4090 (24 GB VRAM)           | \~$0.35/hr      | AnythingLLM + faster local inference |
| A100 80 GB                      | \~$1.10/hr      | AnythingLLM + large 70B+ models      |

> 💡 **Pro tip:** AnythingLLM's built-in embedding (LanceDB + local CPU embedder) works without GPU. For the LLM backend, you can use free-tier API providers like OpenRouter or Groq to keep costs minimal.

### Prerequisites

* Clore.ai server with SSH access
* Docker (pre-installed on Clore.ai servers)
* At least one LLM API key **or** local Ollama/vLLM backend

***

## Quick Start

### Method 1: Single Docker Container (Recommended)

The official single-container deployment includes everything: the web UI, LanceDB vector store, and document processor.

**Step 1: Connect to your Clore.ai server**

```bash
ssh root@<your-clore-server-ip> -p <ssh-port>
```

**Step 2: Set up storage directory**

```bash
export STORAGE_LOCATION=$HOME/anythingllm
mkdir -p $STORAGE_LOCATION
touch "$STORAGE_LOCATION/.env"
```

**Step 3: Run AnythingLLM**

```bash
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v ${STORAGE_LOCATION}:/app/server/storage \
  -v ${STORAGE_LOCATION}/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm
```

> **Why `--cap-add SYS_ADMIN`?** AnythingLLM uses Chromium for webpage scraping and PDF rendering, which requires elevated container capabilities.

**Step 4: Verify startup**

```bash
docker logs anythingllm --tail 30 -f
# Wait for: "Server listening on port 3001"
```

**Step 5: Complete setup wizard**

Open in browser:

```
http://<your-clore-server-ip>:3001
```

The first-time setup wizard guides you through:

1. Create admin account
2. Choose LLM provider
3. Choose embedding model
4. Configure your first workspace

***

### Method 2: Docker Compose (Multi-Service)

For production deployments with separate services and easier management:

**Step 1: Create project directory**

```bash
mkdir -p ~/anythingllm && cd ~/anythingllm
mkdir -p storage
touch storage/.env
```

**Step 2: Create `docker-compose.yml`**

```bash
cat > docker-compose.yml << 'EOF'
version: '3.8'

services:
  anythingllm:
    image: mintplexlabs/anythingllm:latest
    container_name: anythingllm
    restart: unless-stopped
    ports:
      - "3001:3001"
    cap_add:
      - SYS_ADMIN
    environment:
      STORAGE_DIR: "/app/server/storage"
      # LLM Provider (configure one)
      LLM_PROVIDER: openai
      OPEN_AI_KEY: ${OPENAI_API_KEY}
      OPEN_MODEL_PREF: gpt-4o-mini
      # Embedding
      EMBEDDING_ENGINE: native
      # Vector DB
      VECTOR_DB: lancedb
      # Auth
      AUTH_TOKEN: ${ANYTHINGLLM_AUTH_TOKEN}
      JWT_SECRET: ${JWT_SECRET}
    volumes:
      - ./storage:/app/server/storage
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3001/api/ping"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  anythingllm_storage:
EOF
```

**Step 3: Create `.env` file**

```bash
cat > .env << 'EOF'
OPENAI_API_KEY=sk-your-openai-key-here
ANYTHINGLLM_AUTH_TOKEN=your-instance-password-here
JWT_SECRET=your-random-64-char-secret-here
EOF
```

**Step 4: Start**

```bash
docker compose up -d
docker compose logs anythingllm -f
```

***

### Method 3: With Pre-configured Environment Variables

For automated deployment without the setup wizard:

```bash
export STORAGE_LOCATION=$HOME/anythingllm
mkdir -p $STORAGE_LOCATION && touch "$STORAGE_LOCATION/.env"

docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v ${STORAGE_LOCATION}:/app/server/storage \
  -v ${STORAGE_LOCATION}/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e LLM_PROVIDER=openai \
  -e OPEN_AI_KEY=sk-your-key \
  -e OPEN_MODEL_PREF=gpt-4o-mini \
  -e EMBEDDING_ENGINE=native \
  -e VECTOR_DB=lancedb \
  -e AUTH_TOKEN=your-password \
  -e JWT_SECRET=$(openssl rand -hex 32) \
  mintplexlabs/anythingllm
```

***

## Configuration

### LLM Provider Options

AnythingLLM supports a wide range of LLM backends. Set in the UI under **Settings → LLM Preference**, or via environment variables:

**OpenAI:**

```bash
-e LLM_PROVIDER=openai
-e OPEN_AI_KEY=sk-your-key
-e OPEN_MODEL_PREF=gpt-4o
```

**Anthropic Claude:**

```bash
-e LLM_PROVIDER=anthropic
-e ANTHROPIC_API_KEY=sk-ant-your-key
-e ANTHROPIC_MODEL_PREF=claude-sonnet-4-5
```

**Google Gemini:**

```bash
-e LLM_PROVIDER=gemini
-e GEMINI_API_KEY=your-key
-e GEMINI_LLM_MODEL_PREF=gemini-1.5-pro
```

**Ollama (local):**

```bash
-e LLM_PROVIDER=ollama
-e OLLAMA_BASE_PATH=http://172.17.0.1:11434
-e OLLAMA_MODEL_PREF=llama3.2
```

**OpenRouter (access 100+ models):**

```bash
-e LLM_PROVIDER=openrouter
-e OPENROUTER_API_KEY=sk-or-your-key
-e OPENROUTER_MODEL_PREF=meta-llama/llama-3.1-8b-instruct:free
```

### Embedding Configuration

| Engine    | Backend        | GPU Needed | Quality        |
| --------- | -------------- | ---------- | -------------- |
| `native`  | CPU (built-in) | No         | Good           |
| `openai`  | OpenAI API     | No         | Excellent      |
| `ollama`  | Local Ollama   | Optional   | Good-Excellent |
| `localai` | LocalAI        | Optional   | Variable       |

```bash
# Use OpenAI embeddings for best quality
-e EMBEDDING_ENGINE=openai
-e OPEN_AI_KEY=sk-your-key
-e EMBEDDING_MODEL_PREF=text-embedding-3-small

# Use Ollama embeddings for fully local pipeline
-e EMBEDDING_ENGINE=ollama
-e OLLAMA_BASE_PATH=http://172.17.0.1:11434
-e EMBEDDING_MODEL_PREF=nomic-embed-text
```

### Vector Database Options

| DB         | Description            | Best For                       |
| ---------- | ---------------------- | ------------------------------ |
| `lancedb`  | Built-in, no config    | Default, small-medium datasets |
| `chroma`   | ChromaDB (external)    | Medium datasets, flexibility   |
| `pinecone` | Pinecone cloud         | Large datasets, production     |
| `weaviate` | Weaviate (self-hosted) | Advanced use cases             |

### Workspace Configuration

AnythingLLM workspaces are isolated environments with their own:

* Document knowledge base
* LLM settings (can override global)
* Chat history
* Agent configurations

Create workspaces via the UI or API:

```bash
# Create workspace via API
curl -X POST http://localhost:3001/api/v1/workspace/new \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"name": "My Project", "similarityThreshold": 0.7}'
```

### Document Ingestion

Upload documents via UI or API:

```bash
# Upload a document via API
curl -X POST http://localhost:3001/api/v1/document/upload \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@/path/to/document.pdf"

# Move document to workspace
curl -X POST http://localhost:3001/api/v1/workspace/my-project/update-embeddings \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"adds": ["custom-documents/document.pdf-chunk-1.json"]}'
```

***

## GPU Acceleration

AnythingLLM itself runs on CPU. GPU acceleration applies to the LLM inference backend.

### Running Ollama on the Same Clore.ai Server

```bash
# Start Ollama with GPU support
docker run -d \
  --name ollama \
  --gpus all \
  --restart unless-stopped \
  -p 11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama

# Pull models for AnythingLLM
docker exec ollama ollama pull llama3.2          # 2B, fast
docker exec ollama ollama pull llama3.1:8b       # 8B, balanced
docker exec ollama ollama pull nomic-embed-text  # for embeddings
docker exec ollama ollama pull mxbai-embed-large # better embeddings

# Restart AnythingLLM with Ollama config
docker stop anythingllm && docker rm anythingllm

docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e LLM_PROVIDER=ollama \
  -e OLLAMA_BASE_PATH=http://172.17.0.1:11434 \
  -e OLLAMA_MODEL_PREF=llama3.1:8b \
  -e EMBEDDING_ENGINE=ollama \
  -e EMBEDDING_MODEL_PREF=nomic-embed-text \
  mintplexlabs/anythingllm
```

### GPU-Model Performance on Clore.ai

| Model         | GPU       | VRAM  | Embedding Speed | Inference Speed | Cost/hr |
| ------------- | --------- | ----- | --------------- | --------------- | ------- |
| Llama 3.2 3B  | RTX 3090  | 2 GB  | Fast            | 60–80 tok/s     | \~$0.20 |
| Llama 3.1 8B  | RTX 3090  | 6 GB  | Fast            | 40–60 tok/s     | \~$0.20 |
| Mistral 7B    | RTX 3090  | 5 GB  | Fast            | 45–65 tok/s     | \~$0.20 |
| Llama 3.1 70B | A100 80GB | 40 GB | Medium          | 20–35 tok/s     | \~$1.10 |

***

## Tips & Best Practices

### Document Ingestion Best Practices

```bash
# For large document sets, increase Node.js memory
-e NODE_OPTIONS="--max-old-space-size=4096"

# Recommended chunk settings for different document types
# Technical docs: chunk size 1000, overlap 200
# Legal/contracts: chunk size 500, overlap 100
# Books/articles: chunk size 1500, overlap 300
```

* **Pre-process large PDFs** — OCR-heavy scans slow ingestion. Use `pdftotext` or Adobe OCR beforehand.
* **Organize by workspace** — Create separate workspaces per project/domain for better retrieval precision.
* **Use specific queries** — RAG works best with specific questions, not broad requests.

### Cost Management on Clore.ai

```bash
# Back up your storage before stopping a Clore.ai instance
tar -czf anythingllm-backup-$(date +%Y%m%d).tar.gz ~/anythingllm/

# To resume on a new Clore.ai instance, restore the backup
tar -xzf anythingllm-backup-20240101.tar.gz -C ~/
```

Since Clore.ai instances are ephemeral, always back up the storage directory. It contains:

* Vector embeddings (LanceDB)
* Uploaded documents
* Chat history
* Agent configurations

### Multi-User Setup

```bash
# Enable multi-user mode in the UI:
# Settings → Security → Enable Multi-User Mode

# Or via environment:
-e MULTI_USER_MODE=true

# Create user via API after enabling multi-user
curl -X POST http://localhost:3001/api/v1/admin/users/new \
  -H "Authorization: Bearer admin-api-key" \
  -H "Content-Type: application/json" \
  -d '{"username": "alice", "password": "securepass", "role": "default"}'
```

### AI Agent Configuration

AnythingLLM agents can perform real-world tasks. Enable tools in **Settings → Agents**:

* **Web Browse** — Fetches and reads web pages
* **Google Search** — Searches Google (requires API key)
* **Code Interpreter** — Executes Python in sandbox
* **GitHub** — Reads repositories
* **SQL Connector** — Queries databases

```bash
# Enable agent capabilities via environment
-e AGENT_SEARCH_PROVIDER=google
-e AGENT_GSX_GOOGLE_SEARCH_ENGINE_ID=your-cx-id
-e AGENT_GSX_GOOGLE_SEARCH_KEY=your-api-key
```

### Performance Tuning

```bash
# For heavy document processing workloads
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  --cpus="4" \
  --memory="8g" \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm
```

### Updating AnythingLLM

```bash
# Pull latest image
docker pull mintplexlabs/anythingllm:latest

# Backup first
cp -r $HOME/anythingllm $HOME/anythingllm-backup-$(date +%Y%m%d)

# Stop and remove old container (data is in volume, safe)
docker stop anythingllm && docker rm anythingllm

# Restart with same command
docker run -d \
  --name anythingllm \
  --restart unless-stopped \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm
```

***

## Troubleshooting

### Container starts but UI not accessible

```bash
# Check container is running
docker ps | grep anythingllm

# Check logs for startup errors
docker logs anythingllm --tail 50

# Verify port binding
ss -tlnp | grep 3001

# Check if Clore.ai server has port 3001 in port mapping
# (add it in the Clore.ai deployment settings)
```

### Document upload fails

```bash
# Check available disk space
df -h

# Inspect document processor logs
docker logs anythingllm 2>&1 | grep -i "error\|fail\|upload"

# Verify SYS_ADMIN capability is set (required for Chromium)
docker inspect anythingllm | grep -A5 CapAdd
```

### RAG responses are poor quality / hallucinating

Common causes and fixes:

```bash
# 1. Adjust similarity threshold (lower = more docs retrieved)
# Settings → Workspace → Vector Database → Similarity Threshold: 0.5

# 2. Increase top-K results
# Settings → Workspace → Vector Database → Max Context Snippets: 10

# 3. Improve chunk size (re-ingest documents after changing)
# Settings → Workspace → Text Splitter → Chunk Size: 1000, Overlap: 200

# 4. Switch to better embedding model
-e EMBEDDING_ENGINE=openai
-e EMBEDDING_MODEL_PREF=text-embedding-3-large
```

### Ollama connection fails from AnythingLLM

```bash
# Test from the AnythingLLM container
docker exec anythingllm curl -s http://172.17.0.1:11434/api/tags

# If that fails, find the actual Docker bridge IP
ip route | grep docker
# Use that IP in OLLAMA_BASE_PATH

# Alternative: use host networking
docker run -d \
  --name anythingllm \
  --network host \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  -e OLLAMA_BASE_PATH=http://localhost:11434 \
  mintplexlabs/anythingllm
```

### Out of memory / container crash

```bash
# Check memory usage
docker stats anythingllm

# Free up memory by reducing LanceDB cache
# Or switch to a Clore.ai instance with more RAM

# Restart with memory limits and swap
docker run -d \
  --name anythingllm \
  --memory=6g \
  --memory-swap=8g \
  -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v $HOME/anythingllm:/app/server/storage \
  -v $HOME/anythingllm/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm
```

***

## Further Reading

* [AnythingLLM Documentation](https://docs.anythingllm.com) — full setup and API reference
* [AnythingLLM GitHub](https://github.com/Mintplex-Labs/anything-llm) — source code, issues, roadmap
* [AnythingLLM Docker Hub](https://hub.docker.com/r/mintplexlabs/anythingllm) — image tags
* [Running Ollama on Clore.ai](https://docs.clore.ai/guides/language-models/ollama) — local LLM backend for AnythingLLM
* [Running vLLM on Clore.ai](https://docs.clore.ai/guides/language-models/vllm) — high-performance inference
* [GPU Comparison Guide](https://docs.clore.ai/guides/getting-started/gpu-comparison) — selecting the right Clore.ai tier
* [MCP Documentation](https://modelcontextprotocol.io) — Model Context Protocol for extending agents
* [AnythingLLM API Reference](https://docs.anythingllm.com/api-reference) — REST API for automation


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/ai-platforms-and-agents/anythingllm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
