# Qdrant

> **High-performance vector database for semantic search and RAG applications — GPU-accelerated indexing**

Qdrant is an open-source, production-ready vector database written in Rust. It delivers fast approximate nearest neighbor (ANN) search across billions of vectors with advanced filtering, payload indexing, and multi-vector support. It's the backbone of many production RAG (Retrieval-Augmented Generation) pipelines and semantic search applications.

**GitHub:** [qdrant/qdrant](https://github.com/qdrant/qdrant) — 22K+ ⭐

***

## Why Qdrant?

| Feature                 | Qdrant     | Pinecone     | Weaviate | Chroma   |
| ----------------------- | ---------- | ------------ | -------- | -------- |
| Open source             | ✅          | ❌            | ✅        | ✅        |
| Rust performance        | ✅          | —            | ❌ Go     | ❌ Python |
| Filtering at query time | ✅ Advanced | ✅ Basic      | ✅        | ✅ Basic  |
| Multi-vector            | ✅          | ❌            | ✅        | ❌        |
| Disk-based HNSW         | ✅          | ✅            | ✅        | ❌        |
| Payload indexing        | ✅          | Limited      | ✅        | Limited  |
| gRPC + REST             | ✅ Both     | ✅ REST       | ✅        | REST     |
| Self-hosted             | ✅          | ❌ Cloud only | ✅        | ✅        |

{% hint style="success" %}
**Qdrant is written in Rust** — delivering C-level performance with memory safety. Benchmark tests show Qdrant is consistently **1.5–3x faster** than Python-based alternatives like Chroma for high-load scenarios.
{% endhint %}

***

## Key Use Cases

* **RAG (Retrieval-Augmented Generation)** — find relevant context for LLM prompts
* **Semantic search** — search by meaning, not just keywords
* **Recommendation systems** — find similar items by embedding similarity
* **Duplicate detection** — identify near-duplicate content
* **Anomaly detection** — find vectors far from cluster centers
* **Image/audio similarity search** — multimodal retrieval

***

## Prerequisites

* Clore.ai account with GPU rental
* Basic familiarity with REST APIs or Python
* Your embedding model of choice (OpenAI, SentenceTransformers, etc.)

***

## Step 1 — Rent a Server on Clore.ai

Qdrant is primarily CPU/RAM-bound for serving, but benefits from GPU when:

* Generating embeddings alongside serving (embedding model on same server)
* Large-scale batch indexing operations

1. Go to [clore.ai](https://clore.ai) → **Marketplace**
2. For **embeddings + serving combo:** RTX 3090/4090 with 32GB+ RAM
3. For **serving only:** CPU-optimized server with fast NVMe storage

{% hint style="info" %}
**Memory Planning:**

* Each float32 vector with 1536 dimensions = 6KB
* 1 million vectors = \~6GB RAM
* 10 million vectors = \~60GB RAM
* Enable on-disk storage for very large collections
  {% endhint %}

***

## Step 2 — Deploy Qdrant Container

**Docker Image:**

```
qdrant/qdrant:latest
```

**Ports:**

```
22
6333
6334
```

* **Port 6333:** REST API (HTTP)
* **Port 6334:** gRPC API (higher performance for bulk operations)

**Environment Variables:**

```
QDRANT__SERVICE__HTTP_PORT=6333
QDRANT__SERVICE__GRPC_PORT=6334
QDRANT__LOG_LEVEL=INFO
QDRANT__STORAGE__STORAGE_PATH=/qdrant/storage
```

**Volume/Persistent Storage:** Mount `/qdrant/storage` for data persistence. Without this, data is lost on container restart.

***

## Step 3 — Verify Qdrant is Running

```bash
ssh root@<server-ip> -p <ssh-port>

# Check Qdrant is running
curl http://localhost:6333/

# Expected response:
# {"title":"qdrant - vector search engine","version":"..."}

# Check health
curl http://localhost:6333/healthz

# Check cluster info
curl http://localhost:6333/cluster
```

***

## Step 4 — Install Python Client

```bash
# Install Qdrant Python client and embedding tools
pip install qdrant-client sentence-transformers openai numpy

# Verify connection
python3 << 'EOF'
from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)
print(f"Qdrant connected: {client.get_collections()}")
EOF
```

***

## Step 5 — Create a Collection

A collection is a named group of vectors with a fixed dimensionality.

```python
from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance,
    VectorParams,
    HnswConfigDiff,
    OptimizersConfigDiff,
    QuantizationConfig,
    ScalarQuantizationConfig,
    ScalarType
)

client = QdrantClient("localhost", port=6333)

# Create collection for OpenAI text-embedding-3-small (1536 dims)
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,           # Vector dimension (match your embedding model)
        distance=Distance.COSINE,  # Options: COSINE, EUCLID, DOT
        on_disk=False        # Set True for very large collections
    ),
    hnsw_config=HnswConfigDiff(
        m=16,                # HNSW graph connectivity (higher = better recall, more RAM)
        ef_construct=100,    # Build-time search depth (higher = better quality, slower indexing)
        full_scan_threshold=10000  # Use brute force below this count
    ),
    optimizers_config=OptimizersConfigDiff(
        indexing_threshold=20000  # Start HNSW indexing after this many vectors
    ),
    quantization_config=QuantizationConfig(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,  # Compress vectors to INT8 (4x memory reduction)
            quantile=0.99,
            always_ram=True        # Keep quantized index in RAM
        )
    )
)

print("Collection created!")
print(client.get_collection("documents"))
```

### Collection for SentenceTransformers (384 dims)

```python
client.create_collection(
    collection_name="embeddings_384",
    vectors_config=VectorParams(
        size=384,              # all-MiniLM-L6-v2 output size
        distance=Distance.COSINE
    )
)
```

***

## Step 6 — Index Documents

### With OpenAI Embeddings

```python
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct
from openai import OpenAI
import uuid

client = QdrantClient("localhost", port=6333)
openai_client = OpenAI(api_key="your-openai-api-key")

def get_embeddings(texts: list[str], batch_size: int = 100) -> list[list[float]]:
    """Generate embeddings in batches."""
    all_embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        response = openai_client.embeddings.create(
            model="text-embedding-3-small",
            input=batch
        )
        all_embeddings.extend([e.embedding for e in response.data])
    return all_embeddings

# Sample documents
documents = [
    {
        "id": str(uuid.uuid4()),
        "text": "Qdrant is a vector database built in Rust for high performance.",
        "source": "documentation",
        "category": "database",
        "year": 2024
    },
    {
        "id": str(uuid.uuid4()),
        "text": "Machine learning models convert text to dense vector representations.",
        "source": "article",
        "category": "ml",
        "year": 2023
    },
    # Add more documents...
]

# Generate embeddings
texts = [doc["text"] for doc in documents]
embeddings = get_embeddings(texts)

# Upsert into Qdrant
points = [
    PointStruct(
        id=str(uuid.uuid4()),
        vector=embedding,
        payload={
            "text": doc["text"],
            "source": doc["source"],
            "category": doc["category"],
            "year": doc["year"]
        }
    )
    for doc, embedding in zip(documents, embeddings)
]

client.upsert(
    collection_name="documents",
    points=points,
    wait=True  # Wait for indexing to complete
)

print(f"Indexed {len(points)} documents!")
```

### With SentenceTransformers (Local, GPU-accelerated)

```python
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct
import torch
import uuid

# Load embedding model on GPU
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

client = QdrantClient("localhost", port=6333)

documents = [
    {"text": "How do I set up Qdrant on a GPU server?", "tag": "setup"},
    {"text": "Vector databases store high-dimensional embeddings for similarity search.", "tag": "concept"},
    {"text": "HNSW algorithm provides approximate nearest neighbor search.", "tag": "algorithm"},
    # ... more documents
]

# GPU-accelerated batch encoding
texts = [doc["text"] for doc in documents]
embeddings = model.encode(
    texts,
    batch_size=256,       # Large batch size for GPU efficiency
    show_progress_bar=True,
    normalize_embeddings=True  # Normalize for cosine similarity
)

# Index in Qdrant
points = [
    PointStruct(
        id=str(uuid.uuid4()),
        vector=embedding.tolist(),
        payload=doc
    )
    for doc, embedding in zip(documents, embeddings)
]

# Batch upsert (more efficient)
BATCH_SIZE = 1000
for i in range(0, len(points), BATCH_SIZE):
    batch = points[i:i + BATCH_SIZE]
    client.upsert(collection_name="embeddings_384", points=batch)
    print(f"Indexed {min(i + BATCH_SIZE, len(points))}/{len(points)}")
```

***

## Step 7 — Search and Query

### Basic Semantic Search

```python
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

client = QdrantClient("localhost", port=6333)
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

def search(query: str, limit: int = 5, collection: str = "embeddings_384"):
    # Generate query embedding
    query_vector = model.encode(query, normalize_embeddings=True).tolist()
    
    # Search
    results = client.search(
        collection_name=collection,
        query_vector=query_vector,
        limit=limit,
        with_payload=True,
        with_vectors=False    # Don't return vectors (saves bandwidth)
    )
    
    return results

# Test search
results = search("vector database performance")
for r in results:
    print(f"Score: {r.score:.4f} | {r.payload['text'][:100]}")
```

### Filtered Search (Metadata + Vector)

```python
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

# Search with metadata filters
results = client.search(
    collection_name="documents",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="database")
            ),
            FieldCondition(
                key="year",
                range=Range(gte=2023)  # Year >= 2023
            )
        ]
    ),
    limit=10,
    with_payload=True
)
```

### Batch/Multi-Query Search

```python
from qdrant_client.models import SearchRequest

queries = [
    "how to install vector database",
    "machine learning inference optimization",
    "RAG pipeline architecture"
]

query_vectors = model.encode(queries, normalize_embeddings=True)

# Batch search (one API call for all queries)
results = client.search_batch(
    collection_name="embeddings_384",
    requests=[
        SearchRequest(
            vector=vec.tolist(),
            limit=5,
            with_payload=True
        )
        for vec in query_vectors
    ]
)

for query, res in zip(queries, results):
    print(f"\nQuery: {query}")
    for r in res:
        print(f"  {r.score:.3f}: {r.payload['text'][:80]}")
```

***

## Step 8 — Build a RAG Pipeline

```python
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
from openai import OpenAI

# Initialize clients
qdrant = QdrantClient("localhost", port=6333)
embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
llm = OpenAI(api_key="your-openai-key")

def rag_query(question: str, n_context: int = 5) -> str:
    # Step 1: Embed the question
    query_vector = embedder.encode(question, normalize_embeddings=True).tolist()
    
    # Step 2: Retrieve relevant context from Qdrant
    search_results = qdrant.search(
        collection_name="documents",
        query_vector=query_vector,
        limit=n_context,
        with_payload=True
    )
    
    # Step 3: Build context string
    context = "\n\n".join([
        f"[Source: {r.payload.get('source', 'unknown')}]\n{r.payload['text']}"
        for r in search_results
        if r.score > 0.5  # Filter low-confidence results
    ])
    
    # Step 4: Generate answer with LLM
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "Answer questions based on the provided context. Be concise and accurate."
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ],
        temperature=0.1
    )
    
    return response.choices[0].message.content

# Test RAG pipeline
answer = rag_query("What is Qdrant and how does it work?")
print(answer)
```

***

## Step 9 — Monitor and Manage Collections

```python
# Collection statistics
info = client.get_collection("documents")
print(f"Vectors count: {info.vectors_count:,}")
print(f"Points count: {info.points_count:,}")
print(f"Indexed vectors: {info.indexed_vectors_count:,}")
print(f"Status: {info.status}")
print(f"Disk usage: {info.disk_data_size / 1024 / 1024:.1f} MB")

# List all collections
collections = client.get_collections()
for c in collections.collections:
    print(f" - {c.name}")

# Delete points by filter
client.delete(
    collection_name="documents",
    points_selector=Filter(
        must=[FieldCondition(key="source", match=MatchValue(value="old_source"))]
    )
)

# Optimize collection (force index build)
client.update_collection(
    collection_name="documents",
    optimizer_config=OptimizersConfigDiff(indexing_threshold=0)  # Force immediate indexing
)
```

***

## Troubleshooting

### Connection Refused

```bash
# Check Qdrant is running
docker ps | grep qdrant
# Or check the process
ps aux | grep qdrant

# Check ports are open
curl http://localhost:6333/
netstat -tlnp | grep 6333
```

### Slow Search Performance

```python
# Optimize HNSW parameters for better recall
client.update_collection(
    collection_name="documents",
    hnsw_config=HnswConfigDiff(ef=128)  # Increase search-time ef (default 100)
)

# Use INT8 quantization to fit more vectors in RAM
```

### High Memory Usage

```python
# Enable on-disk storage for large collections
client.create_collection(
    collection_name="large_collection",
    vectors_config=VectorParams(
        size=1536,
        distance=Distance.COSINE,
        on_disk=True  # Store vectors on disk instead of RAM
    )
)
```

***

## REST API Quick Reference

```bash
# List collections
curl http://localhost:6333/collections

# Create collection
curl -X PUT http://localhost:6333/collections/my_collection \
    -H "Content-Type: application/json" \
    -d '{"vectors": {"size": 384, "distance": "Cosine"}}'

# Count points
curl http://localhost:6333/collections/my_collection/points/count

# Search
curl -X POST http://localhost:6333/collections/my_collection/points/search \
    -H "Content-Type: application/json" \
    -d '{
        "vector": [0.1, 0.2, ...],
        "limit": 5,
        "with_payload": true
    }'

# Delete collection
curl -X DELETE http://localhost:6333/collections/my_collection
```

***

## Cost Estimation on Clore.ai

| Setup         | Server             | Monthly Cost | Capacity      |
| ------------- | ------------------ | ------------ | ------------- |
| Small RAG     | RTX 3090, 32GB RAM | \~$60–80     | \~5M vectors  |
| Medium search | RTX 4090, 64GB RAM | \~$120–150   | \~15M vectors |
| Large scale   | A100, 128GB RAM    | \~$250–350   | \~30M vectors |

***

## Additional Resources

* [Qdrant Documentation](https://qdrant.tech/documentation/)
* [Qdrant GitHub](https://github.com/qdrant/qdrant)
* [Qdrant Python Client](https://github.com/qdrant/qdrant-client)
* [Qdrant Examples](https://github.com/qdrant/examples)
* [Vector Database Benchmarks](https://qdrant.tech/benchmarks/)
* [Sentence Transformers](https://www.sbert.net/)

***

*Qdrant on Clore.ai gives you a self-hosted, high-performance vector database without the per-query costs of Pinecone or Weaviate Cloud. Perfect for RAG pipelines processing millions of documents.*

***

## Clore.ai GPU Recommendations

| Use Case                  | Recommended GPU | Est. Cost on Clore.ai |
| ------------------------- | --------------- | --------------------- |
| Development/Testing       | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| Production Vector Search  | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| High-throughput Embedding | RTX 4090 (24GB) | \~$0.70/gpu/hr        |

> 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/rag-and-vector-databases/qdrant.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
