# Qdrant > **High-performance vector database for semantic search and RAG applications — GPU-accelerated indexing** Qdrant is an open-source, production-ready vector database written in Rust. It delivers fast approximate nearest neighbor (ANN) search across billions of vectors with advanced filtering, payload indexing, and multi-vector support. It's the backbone of many production RAG (Retrieval-Augmented Generation) pipelines and semantic search applications. **GitHub:** [qdrant/qdrant](https://github.com/qdrant/qdrant) — 22K+ ⭐ *** ## Why Qdrant? | Feature | Qdrant | Pinecone | Weaviate | Chroma | | ----------------------- | ---------- | ------------ | -------- | -------- | | Open source | ✅ | ❌ | ✅ | ✅ | | Rust performance | ✅ | — | ❌ Go | ❌ Python | | Filtering at query time | ✅ Advanced | ✅ Basic | ✅ | ✅ Basic | | Multi-vector | ✅ | ❌ | ✅ | ❌ | | Disk-based HNSW | ✅ | ✅ | ✅ | ❌ | | Payload indexing | ✅ | Limited | ✅ | Limited | | gRPC + REST | ✅ Both | ✅ REST | ✅ | REST | | Self-hosted | ✅ | ❌ Cloud only | ✅ | ✅ | {% hint style="success" %} **Qdrant is written in Rust** — delivering C-level performance with memory safety. Benchmark tests show Qdrant is consistently **1.5–3x faster** than Python-based alternatives like Chroma for high-load scenarios. {% endhint %} *** ## Key Use Cases * **RAG (Retrieval-Augmented Generation)** — find relevant context for LLM prompts * **Semantic search** — search by meaning, not just keywords * **Recommendation systems** — find similar items by embedding similarity * **Duplicate detection** — identify near-duplicate content * **Anomaly detection** — find vectors far from cluster centers * **Image/audio similarity search** — multimodal retrieval *** ## Prerequisites * Clore.ai account with GPU rental * Basic familiarity with REST APIs or Python * Your embedding model of choice (OpenAI, SentenceTransformers, etc.) *** ## Step 1 — Rent a Server on Clore.ai Qdrant is primarily CPU/RAM-bound for serving, but benefits from GPU when: * Generating embeddings alongside serving (embedding model on same server) * Large-scale batch indexing operations 1. Go to [clore.ai](https://clore.ai) → **Marketplace** 2. For **embeddings + serving combo:** RTX 3090/4090 with 32GB+ RAM 3. For **serving only:** CPU-optimized server with fast NVMe storage {% hint style="info" %} **Memory Planning:** * Each float32 vector with 1536 dimensions = 6KB * 1 million vectors = \~6GB RAM * 10 million vectors = \~60GB RAM * Enable on-disk storage for very large collections {% endhint %} *** ## Step 2 — Deploy Qdrant Container **Docker Image:** ``` qdrant/qdrant:latest ``` **Ports:** ``` 22 6333 6334 ``` * **Port 6333:** REST API (HTTP) * **Port 6334:** gRPC API (higher performance for bulk operations) **Environment Variables:** ``` QDRANT__SERVICE__HTTP_PORT=6333 QDRANT__SERVICE__GRPC_PORT=6334 QDRANT__LOG_LEVEL=INFO QDRANT__STORAGE__STORAGE_PATH=/qdrant/storage ``` **Volume/Persistent Storage:** Mount `/qdrant/storage` for data persistence. Without this, data is lost on container restart. *** ## Step 3 — Verify Qdrant is Running ```bash ssh root@ -p # Check Qdrant is running curl http://localhost:6333/ # Expected response: # {"title":"qdrant - vector search engine","version":"..."} # Check health curl http://localhost:6333/healthz # Check cluster info curl http://localhost:6333/cluster ``` *** ## Step 4 — Install Python Client ```bash # Install Qdrant Python client and embedding tools pip install qdrant-client sentence-transformers openai numpy # Verify connection python3 << 'EOF' from qdrant_client import QdrantClient client = QdrantClient("localhost", port=6333) print(f"Qdrant connected: {client.get_collections()}") EOF ``` *** ## Step 5 — Create a Collection A collection is a named group of vectors with a fixed dimensionality. ```python from qdrant_client import QdrantClient from qdrant_client.models import ( Distance, VectorParams, HnswConfigDiff, OptimizersConfigDiff, QuantizationConfig, ScalarQuantizationConfig, ScalarType ) client = QdrantClient("localhost", port=6333) # Create collection for OpenAI text-embedding-3-small (1536 dims) client.create_collection( collection_name="documents", vectors_config=VectorParams( size=1536, # Vector dimension (match your embedding model) distance=Distance.COSINE, # Options: COSINE, EUCLID, DOT on_disk=False # Set True for very large collections ), hnsw_config=HnswConfigDiff( m=16, # HNSW graph connectivity (higher = better recall, more RAM) ef_construct=100, # Build-time search depth (higher = better quality, slower indexing) full_scan_threshold=10000 # Use brute force below this count ), optimizers_config=OptimizersConfigDiff( indexing_threshold=20000 # Start HNSW indexing after this many vectors ), quantization_config=QuantizationConfig( scalar=ScalarQuantizationConfig( type=ScalarType.INT8, # Compress vectors to INT8 (4x memory reduction) quantile=0.99, always_ram=True # Keep quantized index in RAM ) ) ) print("Collection created!") print(client.get_collection("documents")) ``` ### Collection for SentenceTransformers (384 dims) ```python client.create_collection( collection_name="embeddings_384", vectors_config=VectorParams( size=384, # all-MiniLM-L6-v2 output size distance=Distance.COSINE ) ) ``` *** ## Step 6 — Index Documents ### With OpenAI Embeddings ```python from qdrant_client import QdrantClient from qdrant_client.models import PointStruct from openai import OpenAI import uuid client = QdrantClient("localhost", port=6333) openai_client = OpenAI(api_key="your-openai-api-key") def get_embeddings(texts: list[str], batch_size: int = 100) -> list[list[float]]: """Generate embeddings in batches.""" all_embeddings = [] for i in range(0, len(texts), batch_size): batch = texts[i:i + batch_size] response = openai_client.embeddings.create( model="text-embedding-3-small", input=batch ) all_embeddings.extend([e.embedding for e in response.data]) return all_embeddings # Sample documents documents = [ { "id": str(uuid.uuid4()), "text": "Qdrant is a vector database built in Rust for high performance.", "source": "documentation", "category": "database", "year": 2024 }, { "id": str(uuid.uuid4()), "text": "Machine learning models convert text to dense vector representations.", "source": "article", "category": "ml", "year": 2023 }, # Add more documents... ] # Generate embeddings texts = [doc["text"] for doc in documents] embeddings = get_embeddings(texts) # Upsert into Qdrant points = [ PointStruct( id=str(uuid.uuid4()), vector=embedding, payload={ "text": doc["text"], "source": doc["source"], "category": doc["category"], "year": doc["year"] } ) for doc, embedding in zip(documents, embeddings) ] client.upsert( collection_name="documents", points=points, wait=True # Wait for indexing to complete ) print(f"Indexed {len(points)} documents!") ``` ### With SentenceTransformers (Local, GPU-accelerated) ```python from sentence_transformers import SentenceTransformer from qdrant_client import QdrantClient from qdrant_client.models import PointStruct import torch import uuid # Load embedding model on GPU model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") client = QdrantClient("localhost", port=6333) documents = [ {"text": "How do I set up Qdrant on a GPU server?", "tag": "setup"}, {"text": "Vector databases store high-dimensional embeddings for similarity search.", "tag": "concept"}, {"text": "HNSW algorithm provides approximate nearest neighbor search.", "tag": "algorithm"}, # ... more documents ] # GPU-accelerated batch encoding texts = [doc["text"] for doc in documents] embeddings = model.encode( texts, batch_size=256, # Large batch size for GPU efficiency show_progress_bar=True, normalize_embeddings=True # Normalize for cosine similarity ) # Index in Qdrant points = [ PointStruct( id=str(uuid.uuid4()), vector=embedding.tolist(), payload=doc ) for doc, embedding in zip(documents, embeddings) ] # Batch upsert (more efficient) BATCH_SIZE = 1000 for i in range(0, len(points), BATCH_SIZE): batch = points[i:i + BATCH_SIZE] client.upsert(collection_name="embeddings_384", points=batch) print(f"Indexed {min(i + BATCH_SIZE, len(points))}/{len(points)}") ``` *** ## Step 7 — Search and Query ### Basic Semantic Search ```python from qdrant_client import QdrantClient from sentence_transformers import SentenceTransformer client = QdrantClient("localhost", port=6333) model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") def search(query: str, limit: int = 5, collection: str = "embeddings_384"): # Generate query embedding query_vector = model.encode(query, normalize_embeddings=True).tolist() # Search results = client.search( collection_name=collection, query_vector=query_vector, limit=limit, with_payload=True, with_vectors=False # Don't return vectors (saves bandwidth) ) return results # Test search results = search("vector database performance") for r in results: print(f"Score: {r.score:.4f} | {r.payload['text'][:100]}") ``` ### Filtered Search (Metadata + Vector) ```python from qdrant_client.models import Filter, FieldCondition, MatchValue, Range # Search with metadata filters results = client.search( collection_name="documents", query_vector=query_vector, query_filter=Filter( must=[ FieldCondition( key="category", match=MatchValue(value="database") ), FieldCondition( key="year", range=Range(gte=2023) # Year >= 2023 ) ] ), limit=10, with_payload=True ) ``` ### Batch/Multi-Query Search ```python from qdrant_client.models import SearchRequest queries = [ "how to install vector database", "machine learning inference optimization", "RAG pipeline architecture" ] query_vectors = model.encode(queries, normalize_embeddings=True) # Batch search (one API call for all queries) results = client.search_batch( collection_name="embeddings_384", requests=[ SearchRequest( vector=vec.tolist(), limit=5, with_payload=True ) for vec in query_vectors ] ) for query, res in zip(queries, results): print(f"\nQuery: {query}") for r in res: print(f" {r.score:.3f}: {r.payload['text'][:80]}") ``` *** ## Step 8 — Build a RAG Pipeline ```python from qdrant_client import QdrantClient from sentence_transformers import SentenceTransformer from openai import OpenAI # Initialize clients qdrant = QdrantClient("localhost", port=6333) embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") llm = OpenAI(api_key="your-openai-key") def rag_query(question: str, n_context: int = 5) -> str: # Step 1: Embed the question query_vector = embedder.encode(question, normalize_embeddings=True).tolist() # Step 2: Retrieve relevant context from Qdrant search_results = qdrant.search( collection_name="documents", query_vector=query_vector, limit=n_context, with_payload=True ) # Step 3: Build context string context = "\n\n".join([ f"[Source: {r.payload.get('source', 'unknown')}]\n{r.payload['text']}" for r in search_results if r.score > 0.5 # Filter low-confidence results ]) # Step 4: Generate answer with LLM response = llm.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "system", "content": "Answer questions based on the provided context. Be concise and accurate." }, { "role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}" } ], temperature=0.1 ) return response.choices[0].message.content # Test RAG pipeline answer = rag_query("What is Qdrant and how does it work?") print(answer) ``` *** ## Step 9 — Monitor and Manage Collections ```python # Collection statistics info = client.get_collection("documents") print(f"Vectors count: {info.vectors_count:,}") print(f"Points count: {info.points_count:,}") print(f"Indexed vectors: {info.indexed_vectors_count:,}") print(f"Status: {info.status}") print(f"Disk usage: {info.disk_data_size / 1024 / 1024:.1f} MB") # List all collections collections = client.get_collections() for c in collections.collections: print(f" - {c.name}") # Delete points by filter client.delete( collection_name="documents", points_selector=Filter( must=[FieldCondition(key="source", match=MatchValue(value="old_source"))] ) ) # Optimize collection (force index build) client.update_collection( collection_name="documents", optimizer_config=OptimizersConfigDiff(indexing_threshold=0) # Force immediate indexing ) ``` *** ## Troubleshooting ### Connection Refused ```bash # Check Qdrant is running docker ps | grep qdrant # Or check the process ps aux | grep qdrant # Check ports are open curl http://localhost:6333/ netstat -tlnp | grep 6333 ``` ### Slow Search Performance ```python # Optimize HNSW parameters for better recall client.update_collection( collection_name="documents", hnsw_config=HnswConfigDiff(ef=128) # Increase search-time ef (default 100) ) # Use INT8 quantization to fit more vectors in RAM ``` ### High Memory Usage ```python # Enable on-disk storage for large collections client.create_collection( collection_name="large_collection", vectors_config=VectorParams( size=1536, distance=Distance.COSINE, on_disk=True # Store vectors on disk instead of RAM ) ) ``` *** ## REST API Quick Reference ```bash # List collections curl http://localhost:6333/collections # Create collection curl -X PUT http://localhost:6333/collections/my_collection \ -H "Content-Type: application/json" \ -d '{"vectors": {"size": 384, "distance": "Cosine"}}' # Count points curl http://localhost:6333/collections/my_collection/points/count # Search curl -X POST http://localhost:6333/collections/my_collection/points/search \ -H "Content-Type: application/json" \ -d '{ "vector": [0.1, 0.2, ...], "limit": 5, "with_payload": true }' # Delete collection curl -X DELETE http://localhost:6333/collections/my_collection ``` *** ## Cost Estimation on Clore.ai | Setup | Server | Monthly Cost | Capacity | | ------------- | ------------------ | ------------ | ------------- | | Small RAG | RTX 3090, 32GB RAM | \~$60–80 | \~5M vectors | | Medium search | RTX 4090, 64GB RAM | \~$120–150 | \~15M vectors | | Large scale | A100, 128GB RAM | \~$250–350 | \~30M vectors | *** ## Additional Resources * [Qdrant Documentation](https://qdrant.tech/documentation/) * [Qdrant GitHub](https://github.com/qdrant/qdrant) * [Qdrant Python Client](https://github.com/qdrant/qdrant-client) * [Qdrant Examples](https://github.com/qdrant/examples) * [Vector Database Benchmarks](https://qdrant.tech/benchmarks/) * [Sentence Transformers](https://www.sbert.net/) *** *Qdrant on Clore.ai gives you a self-hosted, high-performance vector database without the per-query costs of Pinecone or Weaviate Cloud. Perfect for RAG pipelines processing millions of documents.* *** ## Clore.ai GPU Recommendations | Use Case | Recommended GPU | Est. Cost on Clore.ai | | ------------------------- | --------------- | --------------------- | | Development/Testing | RTX 3090 (24GB) | \~$0.12/gpu/hr | | Production Vector Search | RTX 3090 (24GB) | \~$0.12/gpu/hr | | High-throughput Embedding | RTX 4090 (24GB) | \~$0.70/gpu/hr | > 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access. --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.clore.ai/guides/rag-and-vector-databases/qdrant.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.