> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-de/rag-and-vektordatenbanken/llamaindex.md).

# LlamaIndex

LlamaIndex (ehemals GPT Index) ist ein **Daten-Framework für LLM-Anwendungen** mit über **37.000 GitHub-Sternen**. Während sich LangChain auf das Verketten von LLM-Aufrufen konzentriert, glänzt LlamaIndex bei **Datenaufnahme, Indizierung und strukturierter Abfrage** — wodurch es die erste Wahl ist, wenn Ihre Anwendung über große, heterogene Dokumentensammlungen hinweg schließen muss.

LlamaIndex bietet erstklassige Unterstützung für komplexe Datenstrukturen (Datenbanken, APIs, PDFs, Notion-Seiten, GitHub-Repos) und ausgefeilte Retrieval-Strategien. Der Betrieb auf Clore.ai GPU-Servern mit lokalen LLMs eliminiert API-Kosten und hält Ihre Daten privat.

Wesentliche Stärken:

* 📊 **Datenanschlüsse** — 160+ Integrationen (PDF, SQL, Notion, Slack, GitHub usw.)
* 🗂️ **Mehrere Index-Typen** — Vektor, Baum, Liste, Schlüsselwort, Wissensgraph
* 🔍 **Fortgeschrittenes Retrieval** — Unterfrage-Zerlegung, rekursives Retrieval, Hybrid-Suche
* 🤖 **Abfrage-Engines** — SQL-, strukturierte und natürliche Sprache über jede Datenquelle
* 🧩 **Multimodal** — Bilder, Audio und Video neben Text
* 💾 **Persistenz** — eingebaute Unterstützung für ChromaDB, Pinecone, Weaviate usw.
* ⚡ **Async-first** — für Produktionsdurchsatz gebaut
* 🔗 **Kompatibel mit LangChain** — beide Frameworks zusammen verwenden

{% hint style="success" %}
Alle Beispiele können auf GPU-Servern ausgeführt werden, die über [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

***

## Serveranforderungen

| Parameter      | Minimum                 | Empfohlen                        |
| -------------- | ----------------------- | -------------------------------- |
| GPU            | NVIDIA RTX 3080 (10 GB) | NVIDIA RTX 4090 (24 GB)          |
| VRAM           | 8 GB (7B Modell)        | 24 GB (13B–34B Modelle)          |
| RAM            | 16 GB                   | 32–64 GB                         |
| CPU            | 4 Kerne                 | 16 Kerne                         |
| Festplatte     | 30 GB                   | 100+ GB (lokale Modelle + Daten) |
| Betriebssystem | Ubuntu 20.04+           | Ubuntu 22.04                     |
| CUDA           | 11.8+                   | 12.1+                            |
| Python         | 3.9+                    | 3.11                             |
| Ports          | 22, 8000                | 22, 8000, 11434 (Ollama)         |

{% hint style="info" %}
LlamaIndex ist eine Python-Bibliothek — GPU-Ressourcen werden vom zugrunde liegenden LLM und dem Embedding-Modell verbraucht. Für Produktionsbereitstellungen kombinieren Sie LlamaIndex mit Ollama (für lokale Inferenz) und ChromaDB (für Vektor-Speicherung), beide auf Ihrem Clore.ai GPU-Server laufend.
{% endhint %}

***

## Schnelle Bereitstellung auf CLORE.AI

### 1. Finden Sie einen geeigneten Server

Gehe zu [CLORE.AI Marketplace](https://clore.ai/marketplace) und wählen Sie basierend auf der Größe Ihres LLM:

| Anwendungsfall     | GPU              | Hinweise                           |
| ------------------ | ---------------- | ---------------------------------- |
| Entwicklung / Test | RTX 3080 (10 GB) | 7B-Modelle, kleine Dokumentensätze |
| Produktion (klein) | RTX 4090 (24 GB) | 13B-Modelle, mittlere Datensätze   |
| Produktion (groß)  | A100 40G / 80G   | 34B–70B Modelle, große Datensätze  |
| Enterprise         | H100 (80 GB)     | Maximaler Durchsatz                |

### 2. Konfigurieren Sie Ihre Bereitstellung

**Docker-Image (Basis):**

```
nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
```

**Portzuordnungen:**

```
22    → SSH-Zugang
8000  → LlamaIndex API / Gradio UI
11434 → Ollama Inferenz-Engine
```

**Start-Skript:**

```bash
#!/bin/bash
# Installiere Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &
sleep 5
ollama pull llama3:8b
ollama pull nomic-embed-text

# Installieren Sie LlamaIndex
pip install llama-index llama-index-llms-ollama llama-index-embeddings-ollama
pip install chromadb fastapi uvicorn

python /workspace/app.py
```

### 3. Greifen Sie auf die API zu

```
http://<your-clore-server-ip>:8000
```

***

## Schritt-für-Schritt-Einrichtung

### Schritt 1: SSH auf Ihren Server

```bash
ssh root@<your-clore-server-ip> -p <ssh-port>
```

### Schritt 2: Installieren Sie Ollama

```bash
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &
sleep 5

# Modelle herunterladen
ollama pull llama3:8b              # LLM für Generierung
ollama pull nomic-embed-text       # Embedding-Modell

# Überprüfen
ollama list
```

### Schritt 3: Richten Sie die Python-Umgebung ein

```bash
mkdir -p /workspace/llamaindex-app
cd /workspace/llamaindex-app

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
```

### Schritt 4: Installieren Sie LlamaIndex-Pakete

```bash
# Core LlamaIndex
pip install llama-index

# LLM-Integrationen
pip install llama-index-llms-ollama
pip install llama-index-llms-openai     # Optional: OpenAI

# Embedding-Integrationen
pip install llama-index-embeddings-ollama
pip install llama-index-embeddings-huggingface

# Vektor-Store-Integrationen
pip install llama-index-vector-stores-chroma

# Daten-Loader
pip install llama-index-readers-file
pip install llama-index-readers-web

# Optional: zusätzliche Reader
pip install pypdf docx2txt
```

### Schritt 5: Konfigurieren Sie globale Einstellungen

```python
# settings.py
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

# LLM konfigurieren
Settings.llm = Ollama(
    model="llama3:8b",
    base_url="http://localhost:11434",
    request_timeout=300.0,
    temperature=0.1,
)

# Embeddings konfigurieren
Settings.embed_model = OllamaEmbedding(
    model_name="nomic-embed-text",
    base_url="http://localhost:11434",
)

# Chunk-Einstellungen konfigurieren
Settings.chunk_size = 1024
Settings.chunk_overlap = 200
```

### Schritt 6: Erstellen Sie Ihren ersten Index

```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Dokumente aus einem Verzeichnis laden
documents = SimpleDirectoryReader("/workspace/data/docs").load_data()
print(f"Loaded {len(documents)} documents")

# Vektor-Index erstellen (automatisches Einbetten und Speichern)
index = VectorStoreIndex.from_documents(documents)

# Index auf Festplatte speichern
index.storage_context.persist("/workspace/index_storage")
print("Index built and saved!")
```

### Schritt 7: Abfragen des Index

```python
from llama_index.core import load_index_from_storage, StorageContext

# Bestehenden Index laden
storage_context = StorageContext.from_defaults(persist_dir="/workspace/index_storage")
index = load_index_from_storage(storage_context)

# Abfrage-Engine erstellen
query_engine = index.as_query_engine(similarity_top_k=5)

# Fragen stellen
response = query_engine.query("What GPU servers are available on Clore.ai?")
print(f"Answer: {response}")
print(f"\nSources: {len(response.source_nodes)} nodes used")
```

***

## Beispielanwendungen

### Beispiel 1: Einfaches Dokument Q\&A

```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from pathlib import Path

# LLamaIndex mit lokalen Ollama-Modellen konfigurieren
Settings.llm = Ollama(model="llama3:8b", base_url="http://localhost:11434")
Settings.embed_model = OllamaEmbedding(
    model_name="nomic-embed-text",
    base_url="http://localhost:11434"
)

# Beispielverzeichnis für Dokumente erstellen
data_dir = Path("/workspace/data")
data_dir.mkdir(exist_ok=True)

# Ein Beispiel-Dokument erstellen
(data_dir / "clore_faq.txt").write_text("""
Clore.ai FAQ

F: Was ist Clore.ai?
A: Clore.ai ist ein dezentraler GPU-Cloud-Marktplatz, der GPU-Besitzer mit KI-Forschern und Entwicklern verbindet, die Rechenleistung benötigen.

F: Welche GPUs sind verfügbar?
A: Clore.ai bietet GPUs von der NVIDIA GTX 1080 bis zur neuesten H100 80GB an. Beliebte Optionen sind RTX 4090, A100 40G/80G und RTX 3090.

F: Wie funktioniert die Preisgestaltung?
A: Die Preise werden von den GPU-Anbietern festgelegt und variieren je nach GPU-Modell, VRAM und Verfügbarkeit. In der Regel 30–70 % günstiger als AWS/GCP.

F: Welche Software kann ich ausführen?
A: Beliebige Docker-Container. Vorgefertigte Images für PyTorch, TensorFlow, ComfyUI, Stable Diffusion und mehr sind verfügbar.
""")

# Index erstellen
documents = SimpleDirectoryReader(str(data_dir)).load_data()
index = VectorStoreIndex.from_documents(documents, show_progress=True)

# Abfragen
query_engine = index.as_query_engine(similarity_top_k=3)

questions = [
    "Welche GPUs bietet Clore.ai an?",
    "Wie vergleicht sich die Preisgestaltung von Clore.ai mit AWS?",
    "Kann ich benutzerdefinierte Docker-Container ausführen?",
]

for q in questions:
    print(f"\n❓ {q}")
    response = query_engine.query(q)
    print(f"💬 {response}")
```

***

### Beispiel 2: Multi-Dokument RAG mit ChromaDB

```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext, Settings
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
import chromadb

# LLM und Embeddings konfigurieren
Settings.llm = Ollama(model="llama3:8b", base_url="http://localhost:11434")
Settings.embed_model = OllamaEmbedding(
    model_name="nomic-embed-text",
    base_url="http://localhost:11434"
)

# Verbindung zu ChromaDB herstellen (auf demselben Clore.ai-Server laufend)
chroma_client = chromadb.HttpClient(host="localhost", port=8001)
chroma_collection = chroma_client.get_or_create_collection("llamaindex_docs")

# ChromaDB Vektor-Store für LlamaIndex erstellen
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Dokumente aus mehreren Quellen laden
docs_dir = "/workspace/data/docs"
documents = SimpleDirectoryReader(
    docs_dir,
    recursive=True,              # Unterverzeichnisse einbeziehen
    required_exts=[".pdf", ".txt", ".md"],  # Nur diese Formate
    filename_as_id=True          # Dateiname als Dokument-ID verwenden
).load_data()

print(f"Loaded {len(documents)} documents from {docs_dir}")

# Index erstellen (wird in ChromaDB gespeichert)
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    show_progress=True
)
print("Index built and persisted in ChromaDB!")

# Bestehenden Index laden (für zukünftige Sitzungen)
# index = VectorStoreIndex.from_vector_store(vector_store)

# Erweiterte Abfrage-Engine mit Metadaten-Filterung
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters

# Abfrage mit Metadatenfilter
filtered_engine = index.as_query_engine(
    similarity_top_k=5,
    filters=MetadataFilters(
        filters=[
            MetadataFilter(key="file_type", value=".pdf"),
        ]
    )
)

response = filtered_engine.query("Summarize the key technical concepts in the documents.")
print(f"\nFiltered response: {response}")
```

***

### Beispiel 3: Unterfragen-Zerlegung

```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm = Ollama(model="llama3:8b", base_url="http://localhost:11434")
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text", base_url="http://localhost:11434")

# Separate Indizes für verschiedene Wissensdomänen erstellen
def build_index(docs_path, index_name):
    docs = SimpleDirectoryReader(docs_path).load_data()
    index = VectorStoreIndex.from_documents(docs)
    return index

# Getrennte Wissensbasen
pricing_index = build_index("/workspace/data/pricing", "pricing")
technical_index = build_index("/workspace/data/technical", "technical")
faq_index = build_index("/workspace/data/faq", "faq")

# Als Tools verpacken
tools = [
    QueryEngineTool(
        query_engine=pricing_index.as_query_engine(),
        metadata=ToolMetadata(
            name="pricing_docs",
            description="Enthält Preisinformationen, Kostenvergleiche und Abrechnungsdetails für Clore.ai."
        )
    ),
    QueryEngineTool(
        query_engine=technical_index.as_query_engine(),
        metadata=ToolMetadata(
            name="technical_docs",
            description="Enthält technische Dokumentation zu GPU-Spezifikationen, Docker-Bereitstellung und APIs."
        )
    ),
    QueryEngineTool(
        query_engine=faq_index.as_query_engine(),
        metadata=ToolMetadata(
            name="faq_docs",
            description="Enthält häufig gestellte Fragen und deren Antworten."
        )
    ),
]

# Die Unterfrage-Engine zerlegt komplexe Anfragen
sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=tools,
    verbose=True
)

# Komplexe mehrteilige Frage
complex_question = """
Vergleichen Sie die Kosten, ein 7B-Parameter-LLM für 100 Stunden auf Clore.ai vs. AWS zu betreiben,
und erklären Sie die technische Einrichtung, die für jede Option erforderlich ist.
"""

print(f"Question: {complex_question}")
response = sub_question_engine.query(complex_question)
print(f"\nComprehensive Answer:\n{response}")
```

***

### Beispiel 4: Wissensgraph-Index

```python
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm = Ollama(model="llama3:13b", base_url="http://localhost:11434")  # Größeres Modell für bessere Extraktion
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text", base_url="http://localhost:11434")

# Dokumente laden
documents = SimpleDirectoryReader("/workspace/data/docs").load_data()

# Wissensgraph erstellen (extrahiert Entitäten und Beziehungen)
kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=10,   # Extrahiere bis zu 10 Triplets pro Chunk
    include_embeddings=True,
    show_progress=True
)

# Den Graph speichern
kg_index.storage_context.persist("/workspace/kg_storage")
print(f"Knowledge graph built!")
print(f"Nodes: {len(kg_index.index_struct.table)}")

# Den Wissensgraph abfragen
kg_query_engine = kg_index.as_query_engine(
    include_text=True,            # Quelltext einbeziehen
    retriever_mode="keyword",     # Schlüsselwortbasiertes Retrieval verwenden
    response_mode="tree_summarize"
)

questions = [
    "Welche Beziehungen bestehen zwischen GPU-Modellen und Anwendungsfällen?",
    "Wie hängen Preisgestaltung und GPU-Spezifikationen zusammen?",
    "Welche Bereitstellungsmethoden verbinden sich mit welchen Diensten?",
]

for q in questions:
    print(f"\n🔍 {q}")
    response = kg_query_engine.query(q)
    print(f"📊 {response}")
```

***

### Beispiel 5: SQL-Abfrage-Engine über Datenbank

```python
from llama_index.core import SQLDatabase, Settings
from llama_index.core.query_engine import NLSQLTableQueryEngine
from llama_index.llms.ollama import Ollama
from sqlalchemy import create_engine, text
import pandas as pd

Settings.llm = Ollama(model="llama3:8b", base_url="http://localhost:11434")

# Beispiel-Datenbank mit GPU-Marktplatzdaten erstellen
engine = create_engine("sqlite:////workspace/clore_data.db")

# Tabellen erstellen und befüllen
with engine.connect() as conn:
    conn.execute(text("""
        CREATE TABLE IF NOT EXISTS gpu_servers (
            id INTEGER PRIMARY KEY,
            gpu_model TEXT,
            vram_gb INTEGER,
            price_per_hour REAL,
            location TEXT,
            available INTEGER
        )
    """))

    conn.execute(text("""
        INSERT OR REPLACE INTO gpu_servers VALUES
        (1, 'RTX 4090', 24, 0.65, 'US-East', 1),
        (2, 'RTX 4090', 24, 0.70, 'EU-West', 1),
        (3, 'A100 80G', 80, 2.50, 'US-West', 1),
        (4, 'H100 80G', 80, 4.20, 'US-East', 0),
        (5, 'RTX 3090', 24, 0.35, 'Asien-Pazifik', 1),
        (6, 'RTX 3080', 10, 0.20, 'EU-East', 1),
        (7, 'A100 40G', 40, 1.50, 'US-East', 1)
    """))
    conn.commit()

# LlamaIndex SQL-Datenbank-Wrapper erstellen
sql_database = SQLDatabase(engine, include_tables=["gpu_servers"])

# Natürlichsprachliche zu SQL-Abfrage-Engine
query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["gpu_servers"],
)

# Abfragen der Datenbank in natürlicher Sprache
nl_queries = [
    "Welcher ist der günstigste verfügbare GPU-Server?",
    "Zeige mir alle GPU-Server mit mehr als 40 GB VRAM",
    "Wie hoch ist der durchschnittliche Preis pro Stunde für RTX 4090-Server?",
    "Welche Standorte haben verfügbare GPU-Server?",
    "Liste alle verfügbaren A100-Server sortiert nach Preis",
]

for query in nl_queries:
    print(f"\n💬 Natürliche Sprache: {query}")
    response = query_engine.query(query)
    print(f"📊 Antwort: {response}")
    if hasattr(response, 'metadata') and 'sql_query' in response.metadata:
        print(f"🔧 SQL: {response.metadata['sql_query']}")
```

***

## Konfiguration

### Docker Compose (Vollständiger LlamaIndex-Stack)

```yaml
version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    runtime: nvidia
    ports:
      - "11434:11434"
    volumes:
      - ollama_models:/root/.ollama
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    restart: unless-stopped

  chromadb:
    image: chromadb/chroma:latest
    container_name: chromadb
    ports:
      - "8001:8000"
    volumes:
      - chroma_data:/chroma/chroma
    environment:
      - IS_PERSISTENT=TRUE
      - ANONYMIZED_TELEMETRY=FALSE
    restart: unless-stopped

  llamaindex-api:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: llamaindex-api
    ports:
      - "8000:8000"
    volumes:
      - ./data:/workspace/data
      - ./indices:/workspace/indices
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - CHROMA_HOST=chromadb
      - CHROMA_PORT=8000
      - LLM_MODEL=llama3:8b
      - EMBED_MODEL=nomic-embed-text
    depends_on:
      - ollama
      - chromadb
    restart: unless-stopped

volumes:
  ollama_models:
  chroma_data:
```

### Wichtige Konfigurationsvariablen

| Einstellung               | Standard       | Beschreibung                   |
| ------------------------- | -------------- | ------------------------------ |
| `Settings.llm`            | OpenAI GPT-3.5 | LLM für Generierung            |
| `Settings.embed_model`    | OpenAI Ada     | Embedding-Modell               |
| `Settings.chunk_size`     | 1024           | Text-Chunk-Größe in Tokens     |
| `Settings.chunk_overlap`  | 200            | Überlappung zwischen Chunks    |
| `Settings.num_output`     | 256            | Maximale Tokens in LLM-Antwort |
| `Settings.context_window` | 4096           | LLM-Kontextfenstergröße        |

***

## Leistungs-Tipps

### 1. Asynchrone Abfragen für Durchsatz

```python
import asyncio
from llama_index.core import VectorStoreIndex

query_engine = index.as_query_engine(use_async=True)

async def batch_query(questions):
    tasks = [query_engine.aquery(q) for q in questions]
    return await asyncio.gather(*tasks)

questions = ["Q1?", "Q2?", "Q3?", "Q4?", "Q5?"]
answers = asyncio.run(batch_query(questions))
```

### 2. Hybrid-Suche (Schlüsselwort + Semantisch)

```python
from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever, KeywordTableSimpleRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import QueryFusionRetriever

# Vektor- und Schlüsselwort-Retrieval kombinieren
retriever = QueryFusionRetriever(
    [
        index.as_retriever(similarity_top_k=5),  # Vektor-Retrieval
        index.as_retriever(retriever_mode="keyword"),  # Schlüsselwort-Retrieval
    ],
    similarity_top_k=5,
    num_queries=3,  # Mehrere Query-Variationen erzeugen
    use_async=True,
    verbose=True,
)

query_engine = RetrieverQueryEngine(retriever=retriever)
```

### 3. Re-Ranking für Qualität

```python
from llama_index.core.postprocessor import SentenceTransformerRerank

# Re-Ranking-Schritt nach dem Retrieval hinzufügen
reranker = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2",
    top_n=3
)

query_engine = index.as_query_engine(
    similarity_top_k=10,  # Mehr Kandidaten abrufen
    node_postprocessors=[reranker]  # Auf Top 3 neu bewerten
)
```

### 4. Streaming für reaktionsfähige UIs

```python
# Tokens streamen, während sie generiert werden
streaming_engine = index.as_query_engine(streaming=True)
response = streaming_engine.query("Explain how Clore.ai works")

for token in response.response_gen:
    print(token, end="", flush=True)
```

***

## Fehlerbehebung

### Problem: Embedding-Modell verbindet sich nicht mit Ollama

```bash
# Ollama-Embeddings direkt testen
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "test text"
}'
```

### Problem: Index-Erstellung ist langsam

```bash
# GPU-Auslastung während des Embeddings überwachen
watch -n1 nvidia-smi

# Verwenden Sie kleinere Batch-Größen
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(
    docs,
    show_progress=True,
    # In kleinere Batches einfügen
)
```

### Problem: ModuleNotFoundError für Integrationen

```bash
# LlamaIndex verwendet Plugin-Architektur in v0.10+
pip install llama-index-llms-ollama
pip install llama-index-embeddings-ollama
pip install llama-index-vector-stores-chroma

# Installierte Pakete überprüfen
pip list | grep llama
```

### Problem: Kontextfenster überschritten

```python
# Chunk-Größe reduzieren
Settings.chunk_size = 512
Settings.chunk_overlap = 50

# Oder ein Modell mit größerem Kontext verwenden
Settings.llm = Ollama(
    model="llama3:8b",
    context_window=8192  # Kontextfenster erweitern
)
```

### Problem: Abfragen liefern irrelevante Ergebnisse

```python
# Ähnlichkeits-Top-k erhöhen
query_engine = index.as_query_engine(similarity_top_k=10)

# Oder ein besseres Embedding-Modell verwenden
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-large-en-v1.5"
)
```

***

## Links

* **GitHub**: <https://github.com/run-llama/llama_index>
* **Offizielle Dokumentation**: <https://docs.llamaindex.ai>
* **PyPI**: <https://pypi.org/project/llama-index>
* **Integrationen**: <https://llamahub.ai>
* **Discord**: <https://discord.gg/dGcwcsnxhU>
* **Blog**: <https://www.llamaindex.ai/blog>
* **CLORE.AI Marketplace**: <https://clore.ai/marketplace>

***

## Clore.ai GPU-Empfehlungen

| Anwendungsfall          | Empfohlene GPU  | Geschätzte Kosten auf Clore.ai |
| ----------------------- | --------------- | ------------------------------ |
| Entwicklung/Tests       | RTX 3090 (24GB) | \~$0.12/gpu/hr                 |
| Produktion RAG          | RTX 3090 (24GB) | \~$0.12/gpu/hr                 |
| Hochdurchsatz-Embedding | RTX 4090 (24GB) | \~$0.70/gpu/hr                 |

> 💡 Alle Beispiele in diesem Leitfaden können bereitgestellt werden auf [Clore.ai](https://clore.ai/marketplace) GPU-Servern. Durchsuchen Sie verfügbare GPUs und mieten Sie stundenweise — keine Verpflichtungen, voller Root-Zugriff.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-de/rag-and-vektordatenbanken/llamaindex.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.