# Qdrant > **Hochleistungs-Vektordatenbank für semantische Suche und RAG-Anwendungen — GPU-beschleunigtes Indexing** Qdrant ist eine Open-Source-, produktionsreife Vektordatenbank, die in Rust geschrieben ist. Sie bietet schnelle Approximate Nearest Neighbor (ANN)-Suche über Milliarden von Vektoren mit erweiterten Filtermöglichkeiten, Payload-Indexierung und Multi-Vektor-Unterstützung. Sie ist das Rückgrat vieler Produktions-RAG-(Retrieval-Augmented Generation)-Pipelines und semantischer Suchanwendungen. **GitHub:** [qdrant/qdrant](https://github.com/qdrant/qdrant) — 22K+ ⭐ *** ## Warum Qdrant? | Funktion | Qdrant | Pinecone | Weaviate | Chroma | | ------------------------- | ----------- | ------------- | -------- | ------------- | | Open Source | ✅ | ❌ | ✅ | ✅ | | Rust-Leistung | ✅ | — | ❌ Go | ❌ Python | | Filterung zur Abfragezeit | ✅ Erweitert | ✅ Grundlegend | ✅ | ✅ Grundlegend | | Multi-Vektor | ✅ | ❌ | ✅ | ❌ | | Festplattenbasiertes HNSW | ✅ | ✅ | ✅ | ❌ | | Payload-Indexierung | ✅ | Begrenzt | ✅ | Begrenzt | | gRPC + REST | ✅ Beides | ✅ REST | ✅ | REST | | Self-hosted | ✅ | ❌ Nur Cloud | ✅ | ✅ | {% hint style="success" %} **Qdrant ist in Rust geschrieben** — liefert C‑ähnliche Leistung mit Speichersicherheit. Benchmark-Tests zeigen, dass Qdrant konsequent **1,5–3x schneller** als Python-basierte Alternativen wie Chroma bei Szenarien mit hoher Belastung. {% endhint %} *** ## Wesentliche Anwendungsfälle * **RAG (Retrieval-Augmented Generation)** — finde relevanten Kontext für LLM-Prompts * **Semantische Suche** — Suche nach Bedeutung, nicht nur nach Schlüsselwörtern * **Empfehlungssysteme** — finde ähnliche Elemente anhand von Embedding-Ähnlichkeit * **Duplikaterkennung** — identifiziere nahezu identische Inhalte * **Anomalieerkennung** — finde Vektoren, die weit von Clusterzentren entfernt sind * **Bild-/Audio-Ähnlichkeitssuche** — multimodales Retrieval *** ## Voraussetzungen * Clore.ai-Konto mit GPU-Vermietung * Grundlegende Vertrautheit mit REST-APIs oder Python * Dein bevorzugtes Embedding-Modell (OpenAI, SentenceTransformers, etc.) *** ## Schritt 1 — Miete einen Server bei Clore.ai Qdrant ist primär CPU-/RAM-gebunden beim Serving, profitiert aber von GPU, wenn: * Embeddings gleichzeitig mit dem Serving erzeugt werden (Embedding-Modell auf demselben Server) * Groß angelegte Batch-Indexierungsoperationen 1. Gehe zu [clore.ai](https://clore.ai) → **Marktplatz** 2. Für **Kombination aus Embeddings + Serving:** RTX 3090/4090 mit 32GB+ RAM 3. Für **nur Serving:** CPU-optimierter Server mit schnellem NVMe-Speicher {% hint style="info" %} **Speicherplanung:** * Jeder float32-Vektor mit 1536 Dimensionen = 6KB * 1 Million Vektoren = \~6GB RAM * 10 Millionen Vektoren = \~60GB RAM * Aktiviere On-Disk-Speicherung für sehr große Sammlungen {% endhint %} *** ## Schritt 2 — Qdrant-Container bereitstellen **Docker-Image:** ``` qdrant/qdrant:latest ``` **Ports:** ``` 22 6333 6334 ``` * **Port 6333:** REST API (HTTP) * **Port 6334:** gRPC API (höhere Leistung für Bulk-Operationen) **Umgebungsvariablen:** ``` QDRANT__SERVICE__HTTP_PORT=6333 QDRANT__SERVICE__GRPC_PORT=6334 QDRANT__LOG_LEVEL=INFO QDRANT__STORAGE__STORAGE_PATH=/qdrant/storage ``` **Volume/Persistenter Speicher:** Mounten `/qdrant/storage` für Datenpersistenz. Ohne dies gehen die Daten beim Neustart des Containers verloren. *** ## Schritt 3 — Prüfe, ob Qdrant läuft ```bash ssh root@ -p # Prüfe, ob Qdrant läuft curl http://localhost:6333/ # Erwartete Antwort: # {"title":"qdrant - vector search engine","version":"..."} # Prüfe Gesundheitszustand curl http://localhost:6333/healthz # Prüfe Cluster-Info curl http://localhost:6333/cluster ``` *** ## Schritt 4 — Python-Client installieren ```bash # Installiere Qdrant Python-Client und Embedding-Tools pip install qdrant-client sentence-transformers openai numpy # Verbindung verifizieren python3 << 'EOF' from qdrant_client import QdrantClient client = QdrantClient("localhost", port=6333) print(f"Qdrant verbunden: {client.get_collections()}") EOF ``` *** ## Schritt 5 — Erstelle eine Collection Eine Collection ist eine benannte Gruppe von Vektoren mit einer festen Dimensionalität. ```python from qdrant_client import QdrantClient from qdrant_client.models import ( Distance, VectorParams, HnswConfigDiff, OptimizersConfigDiff, QuantizationConfig, ScalarQuantizationConfig, ScalarType ) client = QdrantClient("localhost", port=6333) # Erstelle Collection für OpenAI text-embedding-3-small (1536 Dims) client.create_collection( collection_name="documents", vectors_config=VectorParams( size=1536, # Vektordimension (entsprechend deinem Embedding-Modell) distance=Distance.COSINE, # Optionen: COSINE, EUCLID, DOT on_disk=False # Setze True für sehr große Sammlungen ), hnsw_config=HnswConfigDiff( m=16, # HNSW-Graph-Konnektivität (höher = bessere Rückrufquote, mehr RAM) ef_construct=100, # Aufbau-Suchtiefe (höher = bessere Qualität, langsameres Indexieren) full_scan_threshold=10000 # Verwende Brute-Force unterhalb dieser Anzahl ), optimizers_config=OptimizersConfigDiff( indexing_threshold=20000 # Starte HNSW-Indexierung nach dieser Anzahl Vektoren ), quantization_config=QuantizationConfig( scalar=ScalarQuantizationConfig( type=ScalarType.INT8, # Komprimiere Vektoren auf INT8 (4x Speicherreduktion) quantile=0.99, always_ram=True # Halte quantisierten Index im RAM ) ) ) print("Collection erstellt!") print(client.get_collection("documents")) ``` ### Collection für SentenceTransformers (384 Dims) ```python client.create_collection( collection_name="embeddings_384", vectors_config=VectorParams( size=384, # all-MiniLM-L6-v2 Ausgabegröße distance=Distance.COSINE ) ) ``` *** ## Schritt 6 — Dokumente indexieren ### Mit OpenAI-Embeddings ```python from qdrant_client import QdrantClient from qdrant_client.models import PointStruct from openai import OpenAI import uuid client = QdrantClient("localhost", port=6333) openai_client = OpenAI(api_key="dein-openai-api-key") def get_embeddings(texts: list[str], batch_size: int = 100) -> list[list[float]]: """Erzeuge Embeddings in Batches.""" all_embeddings = [] for i in range(0, len(texts), batch_size): batch = texts[i:i + batch_size] response = openai_client.embeddings.create( model="text-embedding-3-small", input=batch ) all_embeddings.extend([e.embedding for e in response.data]) return all_embeddings # Beispiel-Dokumente documents = [ { "id": str(uuid.uuid4()), "text": "Qdrant ist eine in Rust gebaute Vektordatenbank für hohe Leistung.", "source": "documentation", "category": "database", "year": 2024 }, { "id": str(uuid.uuid4()), "text": "Machine-Learning-Modelle wandeln Text in dichte Vektorrepräsentationen um.", "source": "article", "category": "ml", "year": 2023 }, # Füge mehr Dokumente hinzu... ] # Generiere Embeddings texts = [doc["text"] for doc in documents] embeddings = get_embeddings(texts) # Upsert in Qdrant points = [ PointStruct( id=str(uuid.uuid4()), vector=embedding, payload={ "text": doc["text"], "source": doc["source"], "category": doc["category"], "year": doc["year"] } ) for doc, embedding in zip(documents, embeddings) ] client.upsert( collection_name="documents", points=points, wait=True # Warte, bis die Indexierung abgeschlossen ist ) print(f"{len(points)} Dokumente indexiert!") ``` ### Mit SentenceTransformers (lokal, GPU-beschleunigt) ```python from sentence_transformers import SentenceTransformer from qdrant_client import QdrantClient from qdrant_client.models import PointStruct import torch import uuid # Lade Embedding-Modell auf der GPU model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") client = QdrantClient("localhost", port=6333) documents = [ {"text": "Wie richte ich Qdrant auf einem GPU-Server ein?", "tag": "setup"}, {"text": "Vektordatenbanken speichern hochdimensionale Embeddings für Ähnlichkeitssuchen.", "tag": "concept"}, {"text": "Der HNSW-Algorithmus bietet approximate nearest neighbor Suche.", "tag": "algorithm"}, # ... weitere Dokumente ] # GPU-beschleunigte Batch-Codierung texts = [doc["text"] for doc in documents] embeddings = model.encode( texts, batch_size=256, # Große Batch-Größe für GPU-Effizienz show_progress_bar=True, normalize_embeddings=True # Normalisiere für Kosinus-Ähnlichkeit ) # Indexiere in Qdrant points = [ PointStruct( id=str(uuid.uuid4()), vector=embedding.tolist(), payload=doc ) for doc, embedding in zip(documents, embeddings) ] # Batch-Upsert (effizienter) BATCH_SIZE = 1000 for i in range(0, len(points), BATCH_SIZE): batch = points[i:i + BATCH_SIZE] client.upsert(collection_name="embeddings_384", points=batch) print(f"Indexiert {min(i + BATCH_SIZE, len(points))}/{len(points)}") ``` *** ## Schritt 7 — Suche und Abfragen ### Basis Semantische Suche ```python from qdrant_client import QdrantClient from sentence_transformers import SentenceTransformer client = QdrantClient("localhost", port=6333) model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") def search(query: str, limit: int = 5, collection: str = "embeddings_384"): # Generiere Query-Embedding query_vector = model.encode(query, normalize_embeddings=True).tolist() # Suche results = client.search( collection_name=collection, query_vector=query_vector, limit=limit, with_payload=True, with_vectors=False # Vektoren nicht zurückgeben (spart Bandbreite) ) return results # Test-Suche results = search("vector database performance") for r in results: print(f"Score: {r.score:.4f} | {r.payload['text'][:100]}") ``` ### Gefilterte Suche (Metadaten + Vektor) ```python from qdrant_client.models import Filter, FieldCondition, MatchValue, Range # Suche mit Metadaten-Filtern results = client.search( collection_name="documents", query_vector=query_vector, query_filter=Filter( must=[ FieldCondition( key="category", match=MatchValue(value="database") ), FieldCondition( key="year", range=Range(gte=2023) # Jahr >= 2023 ) ] ), limit=10, with_payload=True ) ``` ### Batch-/Multi-Query-Suche ```python from qdrant_client.models import SearchRequest queries = [ "how to install vector database", "machine learning inference optimization", "RAG pipeline architecture" ] query_vectors = model.encode(queries, normalize_embeddings=True) # Batch-Suche (ein API-Aufruf für alle Anfragen) results = client.search_batch( collection_name="embeddings_384", requests=[ SearchRequest( vector=vec.tolist(), limit=5, with_payload=True ) for vec in query_vectors ] ) for query, res in zip(queries, results): print(f"\nQuery: {query}") for r in res: print(f" {r.score:.3f}: {r.payload['text'][:80]}") ``` *** ## Schritt 8 — Baue eine RAG-Pipeline ```python from qdrant_client import QdrantClient from sentence_transformers import SentenceTransformer from openai import OpenAI # Initialisiere Clients qdrant = QdrantClient("localhost", port=6333) embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") llm = OpenAI(api_key="dein-openai-key") def rag_query(question: str, n_context: int = 5) -> str: # Schritt 1: Frage einbetten query_vector = embedder.encode(question, normalize_embeddings=True).tolist() # Schritt 2: Relevanten Kontext aus Qdrant abrufen search_results = qdrant.search( collection_name="documents", query_vector=query_vector, limit=n_context, with_payload=True ) # Schritt 3: Kontext-String bauen context = "\n\n".join([ f"[Quelle: {r.payload.get('source', 'unknown')}]\n{r.payload['text']}" for r in search_results if r.score > 0.5 # Filtere Ergebnisse mit geringer Zuversicht ]) # Schritt 4: Antwort mit LLM generieren response = llm.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": "Beantworte Fragen basierend auf dem bereitgestellten Kontext. Sei prägnant und genau." }, { "role": "user", "content": f"Kontext:\n{context}\n\nFrage: {question}" } ], temperature=0.1 ) return response.choices[0].message.content # RAG-Pipeline testen answer = rag_query("What is Qdrant and how does it work?") print(answer) ``` *** ## Schritt 9 — Sammlungen überwachen und verwalten ```python # Sammlungsstatistiken info = client.get_collection("documents") print(f"Anzahl Vektoren: {info.vectors_count:,}") print(f"Anzahl Punkte: {info.points_count:,}") print(f"Indizierte Vektoren: {info.indexed_vectors_count:,}") print(f"Status: {info.status}") print(f"Festplattennutzung: {info.disk_data_size / 1024 / 1024:.1f} MB") # Alle Collections auflisten collections = client.get_collections() for c in collections.collections: print(f" - {c.name}") # Punkte per Filter löschen client.delete( collection_name="documents", points_selector=Filter( must=[FieldCondition(key="source", match=MatchValue(value="old_source"))] ) ) # Collection optimieren (Indexaufbau erzwingen) client.update_collection( collection_name="documents", optimizer_config=OptimizersConfigDiff(indexing_threshold=0) # Sofortiges Indexieren erzwingen ) ``` *** ## Fehlerbehebung ### Verbindung verweigert ```bash # Prüfe, ob Qdrant läuft docker ps | grep qdrant # Oder prüfe den Prozess ps aux | grep qdrant # Prüfe, ob Ports offen sind curl http://localhost:6333/ netstat -tlnp | grep 6333 ``` ### Langsame Suchleistung ```python # Optimiere HNSW-Parameter für besseren Recall client.update_collection( collection_name="documents", hnsw_config=HnswConfigDiff(ef=128) # Erhöhe die Suchzeit-ef (Standard 100) ) # Verwende INT8-Quantisierung, um mehr Vektoren in den RAM zu bekommen ``` ### Hoher Speicherverbrauch ```python # Aktiviere On-Disk-Speicherung für große Sammlungen client.create_collection( collection_name="large_collection", vectors_config=VectorParams( size=1536, distance=Distance.COSINE, on_disk=True # Vektoren auf der Festplatte statt im RAM speichern ) ) ``` *** ## REST API Schnellreferenz ```bash # Collections auflisten curl http://localhost:6333/collections # Collection erstellen curl -X PUT http://localhost:6333/collections/my_collection \ -H "Content-Type: application/json" \ -d '{"vectors": {"size": 384, "distance": "Cosine"}}' # Punkte zählen curl http://localhost:6333/collections/my_collection/points/count # Suche curl -X POST http://localhost:6333/collections/my_collection/points/search \ -H "Content-Type: application/json" \ -d '{ "vector": [0.1, 0.2, ...], "limit": 5, "with_payload": true }' # Collection löschen curl -X DELETE http://localhost:6333/collections/my_collection ``` *** ## Kostenschätzung auf Clore.ai | Einrichtung | Server | Monatliche Kosten | Kapazität | | -------------- | ------------------ | ----------------- | -------------- | | Kleines RAG | RTX 3090, 32GB RAM | \~$60–80 | \~5M Vektoren | | Mittlere Suche | RTX 4090, 64GB RAM | \~$120–150 | \~15M Vektoren | | Großmaßstab | A100, 128GB RAM | \~$250–350 | \~30M Vektoren | *** ## Zusätzliche Ressourcen * [Qdrant Dokumentation](https://qdrant.tech/documentation/) * [Qdrant GitHub](https://github.com/qdrant/qdrant) * [Qdrant Python-Client](https://github.com/qdrant/qdrant-client) * [Qdrant Beispiele](https://github.com/qdrant/examples) * [Vektordatenbank-Benchmarks](https://qdrant.tech/benchmarks/) * [Sentence Transformers](https://www.sbert.net/) *** *Qdrant auf Clore.ai bietet dir eine selbstgehostete, hochleistungsfähige Vektordatenbank ohne die Per-Query-Kosten von Pinecone oder Weaviate Cloud. Perfekt für RAG-Pipelines, die Millionen von Dokumenten verarbeiten.* *** ## Clore.ai GPU-Empfehlungen | Anwendungsfall | Empfohlene GPU | Geschätzte Kosten auf Clore.ai | | ----------------------------- | --------------- | ------------------------------ | | Entwicklung/Tests | RTX 3090 (24GB) | \~$0.12/gpu/hr | | Produktionsfähige Vektorsuche | RTX 3090 (24GB) | \~$0.12/gpu/hr | | Hochdurchsatz-Embedding | RTX 4090 (24GB) | \~$0.70/gpu/hr | > 💡 Alle Beispiele in diesem Leitfaden können bereitgestellt werden auf [Clore.ai](https://clore.ai/marketplace) GPU-Servern. Durchsuchen Sie verfügbare GPUs und mieten Sie stundenweise — keine Verpflichtungen, voller Root-Zugriff. --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.clore.ai/guides/guides_v2-de/rag-und-vektor-datenbanken/qdrant.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.