# Qdrant

> **High-performance vector database for semantic search and RAG applications — GPU-accelerated indexing**

Qdrant is an open-source, production-ready vector database written in Rust. It delivers fast approximate nearest neighbor (ANN) search across billions of vectors with advanced filtering, payload indexing, and multi-vector support. It's the backbone of many production RAG (Retrieval-Augmented Generation) pipelines and semantic search applications.

**GitHub:** [qdrant/qdrant](https://github.com/qdrant/qdrant) — 22K+ ⭐

***

## Why Qdrant?

| Feature                 | Qdrant     | Pinecone     | Weaviate | Chroma   |
| ----------------------- | ---------- | ------------ | -------- | -------- |
| Open source             | ✅          | ❌            | ✅        | ✅        |
| Rust performance        | ✅          | —            | ❌ Go     | ❌ Python |
| Filtering at query time | ✅ Advanced | ✅ Basic      | ✅        | ✅ Basic  |
| Multi-vector            | ✅          | ❌            | ✅        | ❌        |
| Disk-based HNSW         | ✅          | ✅            | ✅        | ❌        |
| Payload indexing        | ✅          | Limited      | ✅        | Limited  |
| gRPC + REST             | ✅ Both     | ✅ REST       | ✅        | REST     |
| Self-hosted             | ✅          | ❌ Cloud only | ✅        | ✅        |

{% hint style="success" %}
**Qdrant is written in Rust** — delivering C-level performance with memory safety. Benchmark tests show Qdrant is consistently **1.5–3x faster** than Python-based alternatives like Chroma for high-load scenarios.
{% endhint %}

***

## Key Use Cases

* **RAG (Retrieval-Augmented Generation)** — find relevant context for LLM prompts
* **Semantic search** — search by meaning, not just keywords
* **Recommendation systems** — find similar items by embedding similarity
* **Duplicate detection** — identify near-duplicate content
* **Anomaly detection** — find vectors far from cluster centers
* **Image/audio similarity search** — multimodal retrieval

***

## Prerequisites

* Clore.ai account with GPU rental
* Basic familiarity with REST APIs or Python
* Your embedding model of choice (OpenAI, SentenceTransformers, etc.)

***

## Step 1 — Rent a Server on Clore.ai

Qdrant is primarily CPU/RAM-bound for serving, but benefits from GPU when:

* Generating embeddings alongside serving (embedding model on same server)
* Large-scale batch indexing operations

1. Go to [clore.ai](https://clore.ai) → **Marketplace**
2. For **embeddings + serving combo:** RTX 3090/4090 with 32GB+ RAM
3. For **serving only:** CPU-optimized server with fast NVMe storage

{% hint style="info" %}
**Memory Planning:**

* Each float32 vector with 1536 dimensions = 6KB
* 1 million vectors = \~6GB RAM
* 10 million vectors = \~60GB RAM
* Enable on-disk storage for very large collections
  {% endhint %}

***

## Step 2 — Deploy Qdrant Container

**Docker Image:**

```
qdrant/qdrant:latest
```

**Ports:**

```
22
6333
6334
```

* **Port 6333:** REST API (HTTP)
* **Port 6334:** gRPC API (higher performance for bulk operations)

**Environment Variables:**

```
QDRANT__SERVICE__HTTP_PORT=6333
QDRANT__SERVICE__GRPC_PORT=6334
QDRANT__LOG_LEVEL=INFO
QDRANT__STORAGE__STORAGE_PATH=/qdrant/storage
```

**Volume/Persistent Storage:** Mount `/qdrant/storage` for data persistence. Without this, data is lost on container restart.

***

## Step 3 — Verify Qdrant is Running

```bash
ssh root@<server-ip> -p <ssh-port>

# Check Qdrant is running
curl http://localhost:6333/

# Expected response:
# {"title":"qdrant - vector search engine","version":"..."}

# Check health
curl http://localhost:6333/healthz

# Check cluster info
curl http://localhost:6333/cluster
```

***

## Step 4 — Install Python Client

```bash
# Install Qdrant Python client and embedding tools
pip install qdrant-client sentence-transformers openai numpy

# Verify connection
python3 << 'EOF'
from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)
print(f"Qdrant connected: {client.get_collections()}")
EOF
```

***

## Step 5 — Create a Collection

A collection is a named group of vectors with a fixed dimensionality.

```python
from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance,
    VectorParams,
    HnswConfigDiff,
    OptimizersConfigDiff,
    QuantizationConfig,
    ScalarQuantizationConfig,
    ScalarType
)

client = QdrantClient("localhost", port=6333)

# Create collection for OpenAI text-embedding-3-small (1536 dims)
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,           # Vector dimension (match your embedding model)
        distance=Distance.COSINE,  # Options: COSINE, EUCLID, DOT
        on_disk=False        # Set True for very large collections
    ),
    hnsw_config=HnswConfigDiff(
        m=16,                # HNSW graph connectivity (higher = better recall, more RAM)
        ef_construct=100,    # Build-time search depth (higher = better quality, slower indexing)
        full_scan_threshold=10000  # Use brute force below this count
    ),
    optimizers_config=OptimizersConfigDiff(
        indexing_threshold=20000  # Start HNSW indexing after this many vectors
    ),
    quantization_config=QuantizationConfig(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,  # Compress vectors to INT8 (4x memory reduction)
            quantile=0.99,
            always_ram=True        # Keep quantized index in RAM
        )
    )
)

print("Collection created!")
print(client.get_collection("documents"))
```

### Collection for SentenceTransformers (384 dims)

```python
client.create_collection(
    collection_name="embeddings_384",
    vectors_config=VectorParams(
        size=384,              # all-MiniLM-L6-v2 output size
        distance=Distance.COSINE
    )
)
```

***

## Step 6 — Index Documents

### With OpenAI Embeddings

```python
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct
from openai import OpenAI
import uuid

client = QdrantClient("localhost", port=6333)
openai_client = OpenAI(api_key="your-openai-api-key")

def get_embeddings(texts: list[str], batch_size: int = 100) -> list[list[float]]:
    """Generate embeddings in batches."""
    all_embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        response = openai_client.embeddings.create(
            model="text-embedding-3-small",
            input=batch
        )
        all_embeddings.extend([e.embedding for e in response.data])
    return all_embeddings

# Sample documents
documents = [
    {
        "id": str(uuid.uuid4()),
        "text": "Qdrant is a vector database built in Rust for high performance.",
        "source": "documentation",
        "category": "database",
        "year": 2024
    },
    {
        "id": str(uuid.uuid4()),
        "text": "Machine learning models convert text to dense vector representations.",
        "source": "article",
        "category": "ml",
        "year": 2023
    },
    # Add more documents...
]

# Generate embeddings
texts = [doc["text"] for doc in documents]
embeddings = get_embeddings(texts)

# Upsert into Qdrant
points = [
    PointStruct(
        id=str(uuid.uuid4()),
        vector=embedding,
        payload={
            "text": doc["text"],
            "source": doc["source"],
            "category": doc["category"],
            "year": doc["year"]
        }
    )
    for doc, embedding in zip(documents, embeddings)
]

client.upsert(
    collection_name="documents",
    points=points,
    wait=True  # Wait for indexing to complete
)

print(f"Indexed {len(points)} documents!")
```

### With SentenceTransformers (Local, GPU-accelerated)

```python
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct
import torch
import uuid

# Load embedding model on GPU
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

client = QdrantClient("localhost", port=6333)

documents = [
    {"text": "How do I set up Qdrant on a GPU server?", "tag": "setup"},
    {"text": "Vector databases store high-dimensional embeddings for similarity search.", "tag": "concept"},
    {"text": "HNSW algorithm provides approximate nearest neighbor search.", "tag": "algorithm"},
    # ... more documents
]

# GPU-accelerated batch encoding
texts = [doc["text"] for doc in documents]
embeddings = model.encode(
    texts,
    batch_size=256,       # Large batch size for GPU efficiency
    show_progress_bar=True,
    normalize_embeddings=True  # Normalize for cosine similarity
)

# Index in Qdrant
points = [
    PointStruct(
        id=str(uuid.uuid4()),
        vector=embedding.tolist(),
        payload=doc
    )
    for doc, embedding in zip(documents, embeddings)
]

# Batch upsert (more efficient)
BATCH_SIZE = 1000
for i in range(0, len(points), BATCH_SIZE):
    batch = points[i:i + BATCH_SIZE]
    client.upsert(collection_name="embeddings_384", points=batch)
    print(f"Indexed {min(i + BATCH_SIZE, len(points))}/{len(points)}")
```

***

## Step 7 — Search and Query

### Basic Semantic Search

```python
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

client = QdrantClient("localhost", port=6333)
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

def search(query: str, limit: int = 5, collection: str = "embeddings_384"):
    # Generate query embedding
    query_vector = model.encode(query, normalize_embeddings=True).tolist()
    
    # Search
    results = client.search(
        collection_name=collection,
        query_vector=query_vector,
        limit=limit,
        with_payload=True,
        with_vectors=False    # Don't return vectors (saves bandwidth)
    )
    
    return results

# Test search
results = search("vector database performance")
for r in results:
    print(f"Score: {r.score:.4f} | {r.payload['text'][:100]}")
```

### Filtered Search (Metadata + Vector)

```python
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

# Search with metadata filters
results = client.search(
    collection_name="documents",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="database")
            ),
            FieldCondition(
                key="year",
                range=Range(gte=2023)  # Year >= 2023
            )
        ]
    ),
    limit=10,
    with_payload=True
)
```

### Batch/Multi-Query Search

```python
from qdrant_client.models import SearchRequest

queries = [
    "how to install vector database",
    "machine learning inference optimization",
    "RAG pipeline architecture"
]

query_vectors = model.encode(queries, normalize_embeddings=True)

# Batch search (one API call for all queries)
results = client.search_batch(
    collection_name="embeddings_384",
    requests=[
        SearchRequest(
            vector=vec.tolist(),
            limit=5,
            with_payload=True
        )
        for vec in query_vectors
    ]
)

for query, res in zip(queries, results):
    print(f"\nQuery: {query}")
    for r in res:
        print(f"  {r.score:.3f}: {r.payload['text'][:80]}")
```

***

## Step 8 — Build a RAG Pipeline

```python
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
from openai import OpenAI

# Initialize clients
qdrant = QdrantClient("localhost", port=6333)
embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
llm = OpenAI(api_key="your-openai-key")

def rag_query(question: str, n_context: int = 5) -> str:
    # Step 1: Embed the question
    query_vector = embedder.encode(question, normalize_embeddings=True).tolist()
    
    # Step 2: Retrieve relevant context from Qdrant
    search_results = qdrant.search(
        collection_name="documents",
        query_vector=query_vector,
        limit=n_context,
        with_payload=True
    )
    
    # Step 3: Build context string
    context = "\n\n".join([
        f"[Source: {r.payload.get('source', 'unknown')}]\n{r.payload['text']}"
        for r in search_results
        if r.score > 0.5  # Filter low-confidence results
    ])
    
    # Step 4: Generate answer with LLM
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "Answer questions based on the provided context. Be concise and accurate."
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ],
        temperature=0.1
    )
    
    return response.choices[0].message.content

# Test RAG pipeline
answer = rag_query("What is Qdrant and how does it work?")
print(answer)
```

***

## Step 9 — Monitor and Manage Collections

```python
# Collection statistics
info = client.get_collection("documents")
print(f"Vectors count: {info.vectors_count:,}")
print(f"Points count: {info.points_count:,}")
print(f"Indexed vectors: {info.indexed_vectors_count:,}")
print(f"Status: {info.status}")
print(f"Disk usage: {info.disk_data_size / 1024 / 1024:.1f} MB")

# List all collections
collections = client.get_collections()
for c in collections.collections:
    print(f" - {c.name}")

# Delete points by filter
client.delete(
    collection_name="documents",
    points_selector=Filter(
        must=[FieldCondition(key="source", match=MatchValue(value="old_source"))]
    )
)

# Optimize collection (force index build)
client.update_collection(
    collection_name="documents",
    optimizer_config=OptimizersConfigDiff(indexing_threshold=0)  # Force immediate indexing
)
```

***

## Troubleshooting

### Connection Refused

```bash
# Check Qdrant is running
docker ps | grep qdrant
# Or check the process
ps aux | grep qdrant

# Check ports are open
curl http://localhost:6333/
netstat -tlnp | grep 6333
```

### Slow Search Performance

```python
# Optimize HNSW parameters for better recall
client.update_collection(
    collection_name="documents",
    hnsw_config=HnswConfigDiff(ef=128)  # Increase search-time ef (default 100)
)

# Use INT8 quantization to fit more vectors in RAM
```

### High Memory Usage

```python
# Enable on-disk storage for large collections
client.create_collection(
    collection_name="large_collection",
    vectors_config=VectorParams(
        size=1536,
        distance=Distance.COSINE,
        on_disk=True  # Store vectors on disk instead of RAM
    )
)
```

***

## REST API Quick Reference

```bash
# List collections
curl http://localhost:6333/collections

# Create collection
curl -X PUT http://localhost:6333/collections/my_collection \
    -H "Content-Type: application/json" \
    -d '{"vectors": {"size": 384, "distance": "Cosine"}}'

# Count points
curl http://localhost:6333/collections/my_collection/points/count

# Search
curl -X POST http://localhost:6333/collections/my_collection/points/search \
    -H "Content-Type: application/json" \
    -d '{
        "vector": [0.1, 0.2, ...],
        "limit": 5,
        "with_payload": true
    }'

# Delete collection
curl -X DELETE http://localhost:6333/collections/my_collection
```

***

## Cost Estimation on Clore.ai

| Setup         | Server             | Monthly Cost | Capacity      |
| ------------- | ------------------ | ------------ | ------------- |
| Small RAG     | RTX 3090, 32GB RAM | \~$60–80     | \~5M vectors  |
| Medium search | RTX 4090, 64GB RAM | \~$120–150   | \~15M vectors |
| Large scale   | A100, 128GB RAM    | \~$250–350   | \~30M vectors |

***

## Additional Resources

* [Qdrant Documentation](https://qdrant.tech/documentation/)
* [Qdrant GitHub](https://github.com/qdrant/qdrant)
* [Qdrant Python Client](https://github.com/qdrant/qdrant-client)
* [Qdrant Examples](https://github.com/qdrant/examples)
* [Vector Database Benchmarks](https://qdrant.tech/benchmarks/)
* [Sentence Transformers](https://www.sbert.net/)

***

*Qdrant on Clore.ai gives you a self-hosted, high-performance vector database without the per-query costs of Pinecone or Weaviate Cloud. Perfect for RAG pipelines processing millions of documents.*

***

## Clore.ai GPU Recommendations

| Use Case                  | Recommended GPU | Est. Cost on Clore.ai |
| ------------------------- | --------------- | --------------------- |
| Development/Testing       | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| Production Vector Search  | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| High-throughput Embedding | RTX 4090 (24GB) | \~$0.70/gpu/hr        |

> 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.
