AnythingLLM RAG Platform

Deploy AnythingLLM on Clore.ai — an all-in-one RAG application and AI agent platform with built-in document chat, no-code agent builder, and MCP support running on cost-effective GPU cloud servers.

Overview

AnythingLLMarrow-up-right is a full-featured, open-source AI workspace with 40K+ GitHub stars. It combines document-based RAG (Retrieval-Augmented Generation), AI agents, and a no-code agent builder into a single, self-hosted application — all managed through a clean, intuitive UI that requires zero coding to set up.

Why run AnythingLLM on Clore.ai?

  • Complete RAG pipeline out of the box — Upload PDFs, Word docs, websites, and YouTube transcripts. AnythingLLM automatically chunks, embeds, and stores them for semantic search.

  • No GPU required for the application — AnythingLLM uses CPU-based embedding by default. Pair it with a Clore.ai GPU server running Ollama or vLLM for local inference.

  • AI agents with real tools — Built-in agents can browse the web, write and execute code, manage files, and call external APIs — all orchestrated through a GUI.

  • MCP compatibility — Integrates with the Model Context Protocol ecosystem for extended tool connectivity.

  • Workspace isolation — Create separate workspaces with different knowledge bases and LLM settings for different projects or teams.

Architecture Overview

┌─────────────────────────────────────────────┐
│            AnythingLLM (Port 3001)          │
│                                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
│  │ RAG/Docs │  │  Agents  │  │  Users   │  │
│  └────┬─────┘  └────┬─────┘  └──────────┘  │
│       │             │                       │
│  ┌────▼─────────────▼───────┐               │
│  │    LLM Provider Router   │               │
│  └──────────────┬───────────┘               │
└─────────────────┼───────────────────────────┘

     ┌────────────┼────────────┐
     ▼            ▼            ▼
  OpenAI       Anthropic    Ollama (local)
  Claude        Gemini      vLLM (local)

Requirements

Server Specifications

Component
Minimum
Recommended
Notes

GPU

None required

RTX 3090 (if using local LLMs)

For Ollama/vLLM backend only

VRAM

24 GB

For local model inference

CPU

2 vCPU

4 vCPU

Embedding runs on CPU

RAM

4 GB

8 GB

More = larger document index in memory

Storage

10 GB

50+ GB

Document storage, vector DB, model cache

Clore.ai Pricing Reference

Server Type
Approx. Cost
Use Case

CPU instance (4 vCPU, 8 GB RAM)

~$0.05–0.10/hr

AnythingLLM + external API providers

RTX 3090 (24 GB VRAM)

~$0.20/hr

AnythingLLM + Ollama local LLMs

RTX 4090 (24 GB VRAM)

~$0.35/hr

AnythingLLM + faster local inference

A100 80 GB

~$1.10/hr

AnythingLLM + large 70B+ models

💡 Pro tip: AnythingLLM's built-in embedding (LanceDB + local CPU embedder) works without GPU. For the LLM backend, you can use free-tier API providers like OpenRouter or Groq to keep costs minimal.

Prerequisites

  • Clore.ai server with SSH access

  • Docker (pre-installed on Clore.ai servers)

  • At least one LLM API key or local Ollama/vLLM backend


Quick Start

The official single-container deployment includes everything: the web UI, LanceDB vector store, and document processor.

Step 1: Connect to your Clore.ai server

Step 2: Set up storage directory

Step 3: Run AnythingLLM

Why --cap-add SYS_ADMIN? AnythingLLM uses Chromium for webpage scraping and PDF rendering, which requires elevated container capabilities.

Step 4: Verify startup

Step 5: Complete setup wizard

Open in browser:

The first-time setup wizard guides you through:

  1. Create admin account

  2. Choose LLM provider

  3. Choose embedding model

  4. Configure your first workspace


Method 2: Docker Compose (Multi-Service)

For production deployments with separate services and easier management:

Step 1: Create project directory

Step 2: Create docker-compose.yml

Step 3: Create .env file

Step 4: Start


Method 3: With Pre-configured Environment Variables

For automated deployment without the setup wizard:


Configuration

LLM Provider Options

AnythingLLM supports a wide range of LLM backends. Set in the UI under Settings → LLM Preference, or via environment variables:

OpenAI:

Anthropic Claude:

Google Gemini:

Ollama (local):

OpenRouter (access 100+ models):

Embedding Configuration

Engine
Backend
GPU Needed
Quality

native

CPU (built-in)

No

Good

openai

OpenAI API

No

Excellent

ollama

Local Ollama

Optional

Good-Excellent

localai

LocalAI

Optional

Variable

Vector Database Options

DB
Description
Best For

lancedb

Built-in, no config

Default, small-medium datasets

chroma

ChromaDB (external)

Medium datasets, flexibility

pinecone

Pinecone cloud

Large datasets, production

weaviate

Weaviate (self-hosted)

Advanced use cases

Workspace Configuration

AnythingLLM workspaces are isolated environments with their own:

  • Document knowledge base

  • LLM settings (can override global)

  • Chat history

  • Agent configurations

Create workspaces via the UI or API:

Document Ingestion

Upload documents via UI or API:


GPU Acceleration

AnythingLLM itself runs on CPU. GPU acceleration applies to the LLM inference backend.

Running Ollama on the Same Clore.ai Server

GPU-Model Performance on Clore.ai

Model
GPU
VRAM
Embedding Speed
Inference Speed
Cost/hr

Llama 3.2 3B

RTX 3090

2 GB

Fast

60–80 tok/s

~$0.20

Llama 3.1 8B

RTX 3090

6 GB

Fast

40–60 tok/s

~$0.20

Mistral 7B

RTX 3090

5 GB

Fast

45–65 tok/s

~$0.20

Llama 3.1 70B

A100 80GB

40 GB

Medium

20–35 tok/s

~$1.10


Tips & Best Practices

Document Ingestion Best Practices

  • Pre-process large PDFs — OCR-heavy scans slow ingestion. Use pdftotext or Adobe OCR beforehand.

  • Organize by workspace — Create separate workspaces per project/domain for better retrieval precision.

  • Use specific queries — RAG works best with specific questions, not broad requests.

Cost Management on Clore.ai

Since Clore.ai instances are ephemeral, always back up the storage directory. It contains:

  • Vector embeddings (LanceDB)

  • Uploaded documents

  • Chat history

  • Agent configurations

Multi-User Setup

AI Agent Configuration

AnythingLLM agents can perform real-world tasks. Enable tools in Settings → Agents:

  • Web Browse — Fetches and reads web pages

  • Google Search — Searches Google (requires API key)

  • Code Interpreter — Executes Python in sandbox

  • GitHub — Reads repositories

  • SQL Connector — Queries databases

Performance Tuning

Updating AnythingLLM


Troubleshooting

Container starts but UI not accessible

Document upload fails

RAG responses are poor quality / hallucinating

Common causes and fixes:

Ollama connection fails from AnythingLLM

Out of memory / container crash


Further Reading

Last updated

Was this helpful?