# Milvus > **The most scalable open-source vector database for AI applications — built for billions of vectors** Milvus is an open-source vector database purpose-built for scalable similarity search and AI applications. Originally created by Zilliz and donated to the LF AI & Data Foundation, Milvus powers production AI workloads at companies including NVIDIA, AT\&T, IBM, and Salesforce. It's the go-to choice when you need to scale to billions of vectors. **GitHub:** [milvus-io/milvus](https://github.com/milvus-io/milvus) — 32K+ ⭐ *** ## Milvus vs Qdrant — When to Choose Which | Criteria | Milvus | Qdrant | | -------------------- | ------------------------------- | -------------------- | | Scale | Billions of vectors | Hundreds of millions | | Architecture | Distributed (multiple services) | Single binary | | Setup complexity | Higher | Lower | | GPU index support | ✅ Native GPU FAISS | Limited | | Multi-tenancy | ✅ Partitions + aliases | Collection-based | | Streaming ingestion | ✅ Kafka/Pulsar | Limited | | Hybrid search | ✅ Dense + sparse | ✅ | | Cloud-managed option | Zilliz Cloud | Qdrant Cloud | {% hint style="success" %} **Choose Milvus when:** You need to scale to billions of vectors, require GPU-accelerated indexing (IVF\_FLAT\_GPU), or need enterprise features like multi-tenancy, streaming ingestion, and role-based access control. {% endhint %} *** ## Milvus Architecture Milvus in standalone mode (single server) includes: * **milvus** — the main service (proxy, query, data, index coordinators) * **etcd** — metadata storage and service discovery * **MinIO** — object storage for segment data In distributed mode (cluster), each component scales independently. *** ## Prerequisites * Clore.ai account with GPU rental * Docker Compose (usually pre-installed) * Basic Python knowledge * 16GB+ RAM (32GB recommended for production) *** ## Step 1 — Rent a GPU Server on Clore.ai 1. Go to [clore.ai](https://clore.ai) → **Marketplace** 2. **Recommended GPU:** RTX 4090 or A100 for GPU-accelerated indexing 3. **CPU alternative:** Any server with 32GB+ RAM for CPU-based indexing **Minimum Requirements:** * CPU: 8 cores * RAM: 16GB (32GB recommended) * Disk: 50GB SSD/NVMe * GPU: Optional (required only for GPU index types) {% hint style="info" %} **GPU index types in Milvus** (IVF\_FLAT\_GPU, IVFSQ8\_GPU) require CUDA-capable GPUs and dramatically accelerate index building for large collections. If you plan to index 10M+ vectors frequently, GPU indexing pays for itself quickly. {% endhint %} *** ## Step 2 — Deploy Milvus Standalone **Docker Image:** ``` milvusdb/milvus:v2.4.0 ``` Milvus standalone requires etcd and MinIO. Use Docker Compose for the easiest setup. **Ports:** ``` 22 19530 ``` * **Port 19530:** Milvus SDK/gRPC port (primary) * **Port 9091:** Milvus REST API and health check (internal) **Environment Variables:** ``` NVIDIA_VISIBLE_DEVICES=all NVIDIA_DRIVER_CAPABILITIES=compute,utility ``` *** ## Step 3 — Set Up with Docker Compose SSH into your Clore.ai server and create the compose file: ```bash ssh root@ -p # Install Docker Compose if not present which docker-compose || pip install docker-compose # Or use Docker plugin: docker compose version # Create project directory mkdir -p /opt/milvus && cd /opt/milvus # Download official Milvus standalone compose file wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml \ -O docker-compose.yml # Review the compose file cat docker-compose.yml ``` ### Customize docker-compose.yml ```yaml version: '3.5' services: etcd: container_name: milvus-etcd image: quay.io/coreos/etcd:v3.5.5 environment: - ETCD_AUTO_COMPACTION_MODE=revision - ETCD_AUTO_COMPACTION_RETENTION=1000 - ETCD_QUOTA_BACKEND_BYTES=4294967296 - ETCD_SNAPSHOT_COUNT=50000 volumes: - /opt/milvus/etcd:/etcd command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd healthcheck: test: ["CMD", "etcdctl", "endpoint", "health"] interval: 30s timeout: 20s retries: 3 minio: container_name: milvus-minio image: minio/minio:RELEASE.2023-03-13T19-46-17Z environment: MINIO_ACCESS_KEY: minioadmin MINIO_SECRET_KEY: minioadmin ports: - "9001:9001" - "9000:9000" volumes: - /opt/milvus/minio:/minio_data command: minio server /minio_data --console-address ":9001" healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"] interval: 30s timeout: 20s retries: 3 standalone: container_name: milvus-standalone image: milvusdb/milvus:v2.4.0 command: ["milvus", "run", "standalone"] security_opt: - seccomp:unconfined environment: ETCD_ENDPOINTS: etcd:2379 MINIO_ADDRESS: minio:9000 volumes: - /opt/milvus/milvus:/var/lib/milvus healthcheck: test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"] interval: 30s start_period: 90s timeout: 20s retries: 3 ports: - "19530:19530" - "9091:9091" depends_on: - "etcd" - "minio" deploy: resources: reservations: devices: - capabilities: [gpu] # Enable GPU access ``` ### Start Milvus ```bash cd /opt/milvus docker compose up -d # Wait for services to start (~60 seconds) sleep 60 # Check all services are healthy docker compose ps # Check Milvus health curl http://localhost:9091/healthz # Expected: {"status":"ok"} # View logs docker compose logs -f standalone --tail 50 ``` *** ## Step 4 — Install Python Client ```bash pip install pymilvus sentence-transformers numpy tqdm # Verify connection python3 << 'EOF' from pymilvus import connections, utility connections.connect("default", host="localhost", port="19530") print(f"Milvus connected!") print(f"Version: {utility.get_server_version()}") EOF ``` *** ## Step 5 — Create a Collection In Milvus, a **collection** is similar to a database table. It has a schema with typed fields including vector fields. ```python from pymilvus import ( connections, FieldSchema, CollectionSchema, DataType, Collection, utility ) # Connect connections.connect("default", host="localhost", port="19530") # Define schema fields = [ FieldSchema( name="id", dtype=DataType.INT64, is_primary=True, auto_id=True # Auto-generate IDs ), FieldSchema( name="text", dtype=DataType.VARCHAR, max_length=2048 # Maximum text length ), FieldSchema( name="source", dtype=DataType.VARCHAR, max_length=256 ), FieldSchema( name="category", dtype=DataType.VARCHAR, max_length=128 ), FieldSchema( name="year", dtype=DataType.INT32 ), FieldSchema( name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384 # Dimension of your embedding model ) ] schema = CollectionSchema( fields=fields, description="Document embeddings for semantic search", enable_dynamic_field=True # Allow adding fields not in schema ) # Create collection collection_name = "documents" if utility.has_collection(collection_name): utility.drop_collection(collection_name) collection = Collection( name=collection_name, schema=schema, using="default" ) print(f"Collection '{collection_name}' created!") ``` *** ## Step 6 — Create Index Before loading data for search, create an appropriate index: ```python from pymilvus import Collection collection = Collection("documents") # HNSW Index (best for most use cases, low latency) hnsw_params = { "metric_type": "COSINE", # COSINE, L2, or IP (Inner Product) "index_type": "HNSW", "params": { "M": 16, # HNSW graph connectivity (8-64) "efConstruction": 200 # Build-time search depth } } # IVF_FLAT Index (CPU, good for large collections) ivf_params = { "metric_type": "COSINE", "index_type": "IVF_FLAT", "params": { "nlist": 1024 # Number of clusters (sqrt of data size is typical) } } # GPU_IVF_FLAT Index (requires CUDA GPU — fastest for batch queries) gpu_ivf_params = { "metric_type": "L2", "index_type": "GPU_IVF_FLAT", "params": { "nlist": 1024, "cache_dataset_on_device": True } } # Create index on the embedding field collection.create_index( field_name="embedding", index_params=hnsw_params, index_name="embedding_idx" ) # Create scalar index for filtered search collection.create_index(field_name="category", index_name="category_idx") collection.create_index(field_name="year", index_name="year_idx") print("Indexes created!") collection.load() # Load into memory for searching ``` *** ## Step 7 — Insert Data ```python from pymilvus import Collection from sentence_transformers import SentenceTransformer import tqdm collection = Collection("documents") model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") # Your documents documents = [ { "text": "Milvus is an open-source vector database for scalable AI applications.", "source": "documentation", "category": "database", "year": 2024 }, { "text": "HNSW provides fast approximate nearest neighbor search with high recall.", "source": "research", "category": "algorithm", "year": 2023 }, { "text": "GPU-accelerated indexing dramatically reduces build time for large vector collections.", "source": "blog", "category": "performance", "year": 2024 }, # Add thousands more documents here ] def insert_batch(docs: list, batch_size: int = 1000): texts = [d["text"] for d in docs] # GPU-accelerated embedding embeddings = model.encode( texts, batch_size=256, show_progress_bar=False, normalize_embeddings=True ) # Insert into Milvus data = { "text": [d["text"] for d in docs], "source": [d["source"] for d in docs], "category": [d["category"] for d in docs], "year": [d["year"] for d in docs], "embedding": embeddings.tolist() } result = collection.insert(data) return result.insert_count # Insert in batches BATCH_SIZE = 1000 total_inserted = 0 for i in range(0, len(documents), BATCH_SIZE): batch = documents[i:i + BATCH_SIZE] count = insert_batch(batch) total_inserted += count print(f"Inserted {total_inserted}/{len(documents)} documents") # Flush to ensure data is persisted and indexed collection.flush() print(f"Total inserted and flushed: {total_inserted}") ``` *** ## Step 8 — Search and Query ### Basic Semantic Search ```python from pymilvus import Collection from sentence_transformers import SentenceTransformer collection = Collection("documents") collection.load() model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") def search(query: str, top_k: int = 10): query_embedding = model.encode( [query], normalize_embeddings=True )[0].tolist() results = collection.search( data=[query_embedding], anns_field="embedding", param={ "metric_type": "COSINE", "params": {"ef": 64} # HNSW search-time parameter (ef >= top_k) }, limit=top_k, output_fields=["text", "source", "category", "year"] ) return results[0] # Search hits = search("how does vector similarity search work") for hit in hits: print(f"Score: {hit.score:.4f}") print(f"Text: {hit.entity.get('text')[:100]}") print(f"Source: {hit.entity.get('source')}") print() ``` ### Filtered Search ```python from pymilvus import Collection collection = Collection("documents") # Search with metadata filter (boolean expression) results = collection.search( data=[query_embedding], anns_field="embedding", param={"metric_type": "COSINE", "params": {"ef": 64}}, limit=10, expr='category == "database" and year >= 2023', # Boolean filter output_fields=["text", "category", "year"] ) ``` ### Hybrid Search (Dense + Sparse) ```python # Milvus 2.4+ supports hybrid dense+sparse search from pymilvus import AnnSearchRequest, WeightedRanker, Collection collection = Collection("documents") # Dense search request dense_req = AnnSearchRequest( data=[dense_embedding], anns_field="embedding", param={"metric_type": "COSINE", "params": {"ef": 64}}, limit=20 ) # Sparse search request (requires sparse vector field) sparse_req = AnnSearchRequest( data=[sparse_embedding], anns_field="sparse_embedding", param={"metric_type": "IP"}, limit=20 ) # Combine with Reciprocal Rank Fusion results = collection.hybrid_search( [dense_req, sparse_req], rerank=WeightedRanker(0.7, 0.3), # 70% dense, 30% sparse limit=10, output_fields=["text"] ) ``` *** ## Step 9 — Build a RAG Service ```bash pip install fastapi uvicorn openai cat > /workspace/milvus_rag.py << 'EOF' from fastapi import FastAPI from pydantic import BaseModel from pymilvus import Collection, connections from sentence_transformers import SentenceTransformer from openai import OpenAI import os app = FastAPI(title="Milvus RAG API") # Initialize at startup connections.connect("default", host="localhost", port="19530") collection = Collection("documents") collection.load() embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") llm = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) class QueryRequest(BaseModel): question: str n_results: int = 5 @app.get("/health") async def health(): return {"status": "ok", "vectors": collection.num_entities} @app.post("/search") async def semantic_search(req: QueryRequest): embedding = embedder.encode( [req.question], normalize_embeddings=True )[0].tolist() results = collection.search( data=[embedding], anns_field="embedding", param={"metric_type": "COSINE", "params": {"ef": 64}}, limit=req.n_results, output_fields=["text", "source", "category"] ) return { "results": [ { "text": hit.entity.get("text"), "source": hit.entity.get("source"), "score": hit.score } for hit in results[0] ] } @app.post("/rag") async def rag(req: QueryRequest): embedding = embedder.encode([req.question], normalize_embeddings=True)[0].tolist() hits = collection.search( data=[embedding], anns_field="embedding", param={"metric_type": "COSINE", "params": {"ef": 64}}, limit=req.n_results, output_fields=["text", "source"] )[0] context = "\n\n".join([ f"[{hit.entity.get('source')}]: {hit.entity.get('text')}" for hit in hits if hit.score > 0.4 ]) response = llm.chat.completions.create( model="gpt-4o-mini", messages=[ {"role": "system", "content": "Answer based on context. Be concise."}, {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {req.question}"} ] ) return {"answer": response.choices[0].message.content, "context_used": len(hits)} if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) EOF python3 /workspace/milvus_rag.py ``` *** ## Step 10 — Monitor and Manage ```python from pymilvus import connections, utility, Collection connections.connect("default", host="localhost", port="19530") # List all collections print("Collections:", utility.list_collections()) # Collection statistics col = Collection("documents") print(f"Entity count: {col.num_entities:,}") print(f"Schema: {col.schema}") # Partition management col.create_partition("2024_docs") col.create_partition("2023_docs") # Insert with partition col.insert(data, partition_name="2024_docs") # Search specific partition results = col.search( data=[query_vec], anns_field="embedding", param={"metric_type": "COSINE", "params": {"ef": 64}}, limit=10, partition_names=["2024_docs"] # Only search this partition ) ``` *** ## Troubleshooting ### Services Not Starting ```bash # Check container logs docker compose logs etcd docker compose logs minio docker compose logs standalone # Check disk space df -h /opt/milvus # Restart services docker compose restart ``` ### Connection Refused on 19530 ```bash # Verify Milvus is listening netstat -tlnp | grep 19530 # Check health curl http://localhost:9091/healthz # Allow time for startup (90 seconds) docker compose logs standalone | tail -20 ``` ### Index Build Timeout for Large Collections ```python # Increase timeout for large index builds from pymilvus import Collection collection = Collection("documents") collection.create_index( field_name="embedding", index_params=hnsw_params, timeout=3600 # 1 hour timeout ) ``` ### High Memory Usage ```bash # Configure Milvus memory limits in docker-compose.yml # Add to standalone service: deploy: resources: limits: memory: 16g ``` *** ## Index Type Selection Guide | Index Type | Best For | Memory | Speed | GPU Required | | -------------- | ------------------------- | ---------- | --------- | ------------ | | FLAT | Small (<1M), exact search | High | Slow | No | | IVF\_FLAT | Medium (1M–10M) | Medium | Good | No | | HNSW | Low latency, <100M | High | Excellent | No | | IVF\_SQ8 | Compressed, large | Low | Good | No | | GPU\_IVF\_FLAT | Fast batch queries | GPU+RAM | Best | Yes | | DISKANN | Billion-scale | Low (disk) | Good | No | *** ## Performance Benchmarks | Collection Size | Index | GPU | QPS | | --------------- | -------------- | -------- | -------- | | 1M vectors | HNSW | RTX 3090 | \~8,000 | | 10M vectors | IVF\_FLAT | RTX 4090 | \~2,500 | | 10M vectors | GPU\_IVF\_FLAT | A100 | \~12,000 | | 100M vectors | DISKANN | A100 | \~1,200 | *** ## Additional Resources * [Milvus Documentation](https://milvus.io/docs) * [Milvus GitHub](https://github.com/milvus-io/milvus) * [PyMilvus Documentation](https://milvus.io/api-reference/pymilvus/v2.4.x/About.md) * [Milvus Bootcamp](https://github.com/milvus-io/bootcamp) — Example applications * [Zilliz Cloud](https://cloud.zilliz.com/) — Managed Milvus * [Vector Database Comparison](https://milvus.io/docs/benchmark.md) * [Attu GUI](https://github.com/zilliztech/attu) — Web UI for Milvus management *** *Milvus on Clore.ai is the ideal solution for AI applications that need to scale beyond hundreds of millions of vectors. Combined with GPU-accelerated embedding generation, you can build world-class semantic search and RAG systems at a fraction of managed cloud costs.* *** ## Clore.ai GPU Recommendations | Use Case | Recommended GPU | Est. Cost on Clore.ai | | ------------------------- | --------------- | --------------------- | | Development/Testing | RTX 3090 (24GB) | \~$0.12/gpu/hr | | Production Vector Search | RTX 3090 (24GB) | \~$0.12/gpu/hr | | High-throughput Embedding | RTX 4090 (24GB) | \~$0.70/gpu/hr | > 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access. --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.clore.ai/guides/rag-and-vector-databases/milvus.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.