LlamaIndex

Build LlamaIndex data-to-LLM pipelines and RAG applications on Clore.ai GPUs

LlamaIndex (formerly GPT Index) is a data framework for LLM applications with over 37,000 GitHub stars. While LangChain focuses on chaining LLM calls, LlamaIndex excels at data ingestion, indexing, and structured querying — making it the go-to choice when your application needs to reason over large, heterogeneous document collections.

LlamaIndex provides first-class support for complex data structures (databases, APIs, PDFs, Notion pages, GitHub repos) and sophisticated retrieval strategies. Running it on Clore.ai GPU servers with local LLMs eliminates API costs and keeps your data private.

Key strengths:

  • 📊 Data connectors — 160+ integrations (PDF, SQL, Notion, Slack, GitHub, etc.)

  • 🗂️ Multiple index types — vector, tree, list, keyword, knowledge graph

  • 🔍 Advanced retrieval — sub-question decomposition, recursive retrieval, hybrid search

  • 🤖 Query engines — SQL, structured, and natural language over any data source

  • 🧩 Multi-modal — images, audio, and video alongside text

  • 💾 Persistence — built-in support for ChromaDB, Pinecone, Weaviate, etc.

  • Async-first — built for production throughput

  • 🔗 LangChain compatible — use both frameworks together

circle-check

Server Requirements

Parameter
Minimum
Recommended

GPU

NVIDIA RTX 3080 (10 GB)

NVIDIA RTX 4090 (24 GB)

VRAM

8 GB (7B model)

24 GB (13B–34B models)

RAM

16 GB

32–64 GB

CPU

4 cores

16 cores

Disk

30 GB

100+ GB (local models + data)

OS

Ubuntu 20.04+

Ubuntu 22.04

CUDA

11.8+

12.1+

Python

3.9+

3.11

Ports

22, 8000

22, 8000, 11434 (Ollama)

circle-info

LlamaIndex is a Python library — GPU resources are consumed by the underlying LLM and embedding model. For production deployments, pair LlamaIndex with Ollama (for local inference) and ChromaDB (for vector storage), both running on your Clore.ai GPU server.


Quick Deploy on CLORE.AI

1. Find a suitable server

Go to CLORE.AI Marketplacearrow-up-right and choose based on your LLM size:

Use Case
GPU
Notes

Development / Testing

RTX 3080 (10 GB)

7B models, small document sets

Production (small)

RTX 4090 (24 GB)

13B models, medium datasets

Production (large)

A100 40G / 80G

34B–70B models, large datasets

Enterprise

H100 (80 GB)

Maximum throughput

2. Configure your deployment

Docker Image (base):

Port Mappings:

Startup Script:

3. Access the API


Step-by-Step Setup

Step 1: SSH into your server

Step 2: Install Ollama

Step 3: Set up Python environment

Step 4: Install LlamaIndex packages

Step 5: Configure global settings

Step 6: Build your first index

Step 7: Query the index


Usage Examples

Example 1: Basic Document Q&A


Example 2: Multi-Document RAG with ChromaDB


Example 3: Sub-Question Decomposition


Example 4: Knowledge Graph Index


Example 5: SQL Query Engine over Database


Configuration

Docker Compose (Full LlamaIndex Stack)

Key Configuration Variables

Setting
Default
Description

Settings.llm

OpenAI GPT-3.5

LLM for generation

Settings.embed_model

OpenAI Ada

Embedding model

Settings.chunk_size

1024

Text chunk size in tokens

Settings.chunk_overlap

200

Overlap between chunks

Settings.num_output

256

Max tokens in LLM response

Settings.context_window

4096

LLM context window size


Performance Tips

1. Async Queries for Throughput

2. Hybrid Search (Keyword + Semantic)

3. Re-Ranking for Quality

4. Streaming for Responsive UIs


Troubleshooting

Issue: Embedding model not connecting to Ollama

Issue: Index building is slow

Issue: ModuleNotFoundError for integrations

Issue: Context window exceeded

Issue: Queries return irrelevant results



Clore.ai GPU Recommendations

Use Case
Recommended GPU
Est. Cost on Clore.ai

Development/Testing

RTX 3090 (24GB)

~$0.12/gpu/hr

Production RAG

RTX 3090 (24GB)

~$0.12/gpu/hr

High-throughput Embedding

RTX 4090 (24GB)

~$0.70/gpu/hr

💡 All examples in this guide can be deployed on Clore.aiarrow-up-right GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.

Last updated

Was this helpful?