Qdrant

High-performance vector database for semantic search and RAG applications — GPU-accelerated indexing

Qdrant is an open-source, production-ready vector database written in Rust. It delivers fast approximate nearest neighbor (ANN) search across billions of vectors with advanced filtering, payload indexing, and multi-vector support. It's the backbone of many production RAG (Retrieval-Augmented Generation) pipelines and semantic search applications.

GitHub: qdrant/qdrantarrow-up-right — 22K+ ⭐


Why Qdrant?

Feature
Qdrant
Pinecone
Weaviate
Chroma

Open source

Rust performance

❌ Go

❌ Python

Filtering at query time

✅ Advanced

✅ Basic

✅ Basic

Multi-vector

Disk-based HNSW

Payload indexing

Limited

Limited

gRPC + REST

✅ Both

✅ REST

REST

Self-hosted

❌ Cloud only

circle-check

Key Use Cases

  • RAG (Retrieval-Augmented Generation) — find relevant context for LLM prompts

  • Semantic search — search by meaning, not just keywords

  • Recommendation systems — find similar items by embedding similarity

  • Duplicate detection — identify near-duplicate content

  • Anomaly detection — find vectors far from cluster centers

  • Image/audio similarity search — multimodal retrieval


Prerequisites

  • Clore.ai account with GPU rental

  • Basic familiarity with REST APIs or Python

  • Your embedding model of choice (OpenAI, SentenceTransformers, etc.)


Step 1 — Rent a Server on Clore.ai

Qdrant is primarily CPU/RAM-bound for serving, but benefits from GPU when:

  • Generating embeddings alongside serving (embedding model on same server)

  • Large-scale batch indexing operations

  1. Go to clore.aiarrow-up-rightMarketplace

  2. For embeddings + serving combo: RTX 3090/4090 with 32GB+ RAM

  3. For serving only: CPU-optimized server with fast NVMe storage

circle-info

Memory Planning:

  • Each float32 vector with 1536 dimensions = 6KB

  • 1 million vectors = ~6GB RAM

  • 10 million vectors = ~60GB RAM

  • Enable on-disk storage for very large collections


Step 2 — Deploy Qdrant Container

Docker Image:

Ports:

  • Port 6333: REST API (HTTP)

  • Port 6334: gRPC API (higher performance for bulk operations)

Environment Variables:

Volume/Persistent Storage: Mount /qdrant/storage for data persistence. Without this, data is lost on container restart.


Step 3 — Verify Qdrant is Running


Step 4 — Install Python Client


Step 5 — Create a Collection

A collection is a named group of vectors with a fixed dimensionality.

Collection for SentenceTransformers (384 dims)


Step 6 — Index Documents

With OpenAI Embeddings

With SentenceTransformers (Local, GPU-accelerated)


Step 7 — Search and Query

Filtered Search (Metadata + Vector)


Step 8 — Build a RAG Pipeline


Step 9 — Monitor and Manage Collections


Troubleshooting

Connection Refused

Slow Search Performance

High Memory Usage


REST API Quick Reference


Cost Estimation on Clore.ai

Setup
Server
Monthly Cost
Capacity

Small RAG

RTX 3090, 32GB RAM

~$60–80

~5M vectors

Medium search

RTX 4090, 64GB RAM

~$120–150

~15M vectors

Large scale

A100, 128GB RAM

~$250–350

~30M vectors


Additional Resources


Qdrant on Clore.ai gives you a self-hosted, high-performance vector database without the per-query costs of Pinecone or Weaviate Cloud. Perfect for RAG pipelines processing millions of documents.


Clore.ai GPU Recommendations

Use Case
Recommended GPU
Est. Cost on Clore.ai

Development/Testing

RTX 3090 (24GB)

~$0.12/gpu/hr

Production Vector Search

RTX 3090 (24GB)

~$0.12/gpu/hr

High-throughput Embedding

RTX 4090 (24GB)

~$0.70/gpu/hr

💡 All examples in this guide can be deployed on Clore.aiarrow-up-right GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.

Last updated

Was this helpful?