# Milvus

> **The most scalable open-source vector database for AI applications — built for billions of vectors**

Milvus is an open-source vector database purpose-built for scalable similarity search and AI applications. Originally created by Zilliz and donated to the LF AI & Data Foundation, Milvus powers production AI workloads at companies including NVIDIA, AT\&T, IBM, and Salesforce. It's the go-to choice when you need to scale to billions of vectors.

**GitHub:** [milvus-io/milvus](https://github.com/milvus-io/milvus) — 32K+ ⭐

***

## Milvus vs Qdrant — When to Choose Which

| Criteria             | Milvus                          | Qdrant               |
| -------------------- | ------------------------------- | -------------------- |
| Scale                | Billions of vectors             | Hundreds of millions |
| Architecture         | Distributed (multiple services) | Single binary        |
| Setup complexity     | Higher                          | Lower                |
| GPU index support    | ✅ Native GPU FAISS              | Limited              |
| Multi-tenancy        | ✅ Partitions + aliases          | Collection-based     |
| Streaming ingestion  | ✅ Kafka/Pulsar                  | Limited              |
| Hybrid search        | ✅ Dense + sparse                | ✅                    |
| Cloud-managed option | Zilliz Cloud                    | Qdrant Cloud         |

{% hint style="success" %}
**Choose Milvus when:** You need to scale to billions of vectors, require GPU-accelerated indexing (IVF\_FLAT\_GPU), or need enterprise features like multi-tenancy, streaming ingestion, and role-based access control.
{% endhint %}

***

## Milvus Architecture

Milvus in standalone mode (single server) includes:

* **milvus** — the main service (proxy, query, data, index coordinators)
* **etcd** — metadata storage and service discovery
* **MinIO** — object storage for segment data

In distributed mode (cluster), each component scales independently.

***

## Prerequisites

* Clore.ai account with GPU rental
* Docker Compose (usually pre-installed)
* Basic Python knowledge
* 16GB+ RAM (32GB recommended for production)

***

## Step 1 — Rent a GPU Server on Clore.ai

1. Go to [clore.ai](https://clore.ai) → **Marketplace**
2. **Recommended GPU:** RTX 4090 or A100 for GPU-accelerated indexing
3. **CPU alternative:** Any server with 32GB+ RAM for CPU-based indexing

**Minimum Requirements:**

* CPU: 8 cores
* RAM: 16GB (32GB recommended)
* Disk: 50GB SSD/NVMe
* GPU: Optional (required only for GPU index types)

{% hint style="info" %}
**GPU index types in Milvus** (IVF\_FLAT\_GPU, IVFSQ8\_GPU) require CUDA-capable GPUs and dramatically accelerate index building for large collections. If you plan to index 10M+ vectors frequently, GPU indexing pays for itself quickly.
{% endhint %}

***

## Step 2 — Deploy Milvus Standalone

**Docker Image:**

```
milvusdb/milvus:v2.4.0
```

Milvus standalone requires etcd and MinIO. Use Docker Compose for the easiest setup.

**Ports:**

```
22
19530
```

* **Port 19530:** Milvus SDK/gRPC port (primary)
* **Port 9091:** Milvus REST API and health check (internal)

**Environment Variables:**

```
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
```

***

## Step 3 — Set Up with Docker Compose

SSH into your Clore.ai server and create the compose file:

```bash
ssh root@<server-ip> -p <ssh-port>

# Install Docker Compose if not present
which docker-compose || pip install docker-compose
# Or use Docker plugin:
docker compose version

# Create project directory
mkdir -p /opt/milvus && cd /opt/milvus

# Download official Milvus standalone compose file
wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml \
    -O docker-compose.yml

# Review the compose file
cat docker-compose.yml
```

### Customize docker-compose.yml

```yaml
version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - /opt/milvus/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-13T19-46-17Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - /opt/milvus/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.0
    command: ["milvus", "run", "standalone"]
    security_opt:
      - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - /opt/milvus/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]  # Enable GPU access
```

### Start Milvus

```bash
cd /opt/milvus
docker compose up -d

# Wait for services to start (~60 seconds)
sleep 60

# Check all services are healthy
docker compose ps

# Check Milvus health
curl http://localhost:9091/healthz
# Expected: {"status":"ok"}

# View logs
docker compose logs -f standalone --tail 50
```

***

## Step 4 — Install Python Client

```bash
pip install pymilvus sentence-transformers numpy tqdm

# Verify connection
python3 << 'EOF'
from pymilvus import connections, utility

connections.connect("default", host="localhost", port="19530")
print(f"Milvus connected!")
print(f"Version: {utility.get_server_version()}")
EOF
```

***

## Step 5 — Create a Collection

In Milvus, a **collection** is similar to a database table. It has a schema with typed fields including vector fields.

```python
from pymilvus import (
    connections,
    FieldSchema,
    CollectionSchema,
    DataType,
    Collection,
    utility
)

# Connect
connections.connect("default", host="localhost", port="19530")

# Define schema
fields = [
    FieldSchema(
        name="id",
        dtype=DataType.INT64,
        is_primary=True,
        auto_id=True           # Auto-generate IDs
    ),
    FieldSchema(
        name="text",
        dtype=DataType.VARCHAR,
        max_length=2048        # Maximum text length
    ),
    FieldSchema(
        name="source",
        dtype=DataType.VARCHAR,
        max_length=256
    ),
    FieldSchema(
        name="category",
        dtype=DataType.VARCHAR,
        max_length=128
    ),
    FieldSchema(
        name="year",
        dtype=DataType.INT32
    ),
    FieldSchema(
        name="embedding",
        dtype=DataType.FLOAT_VECTOR,
        dim=384                # Dimension of your embedding model
    )
]

schema = CollectionSchema(
    fields=fields,
    description="Document embeddings for semantic search",
    enable_dynamic_field=True  # Allow adding fields not in schema
)

# Create collection
collection_name = "documents"
if utility.has_collection(collection_name):
    utility.drop_collection(collection_name)

collection = Collection(
    name=collection_name,
    schema=schema,
    using="default"
)
print(f"Collection '{collection_name}' created!")
```

***

## Step 6 — Create Index

Before loading data for search, create an appropriate index:

```python
from pymilvus import Collection

collection = Collection("documents")

# HNSW Index (best for most use cases, low latency)
hnsw_params = {
    "metric_type": "COSINE",     # COSINE, L2, or IP (Inner Product)
    "index_type": "HNSW",
    "params": {
        "M": 16,                 # HNSW graph connectivity (8-64)
        "efConstruction": 200    # Build-time search depth
    }
}

# IVF_FLAT Index (CPU, good for large collections)
ivf_params = {
    "metric_type": "COSINE",
    "index_type": "IVF_FLAT",
    "params": {
        "nlist": 1024            # Number of clusters (sqrt of data size is typical)
    }
}

# GPU_IVF_FLAT Index (requires CUDA GPU — fastest for batch queries)
gpu_ivf_params = {
    "metric_type": "L2",
    "index_type": "GPU_IVF_FLAT",
    "params": {
        "nlist": 1024,
        "cache_dataset_on_device": True
    }
}

# Create index on the embedding field
collection.create_index(
    field_name="embedding",
    index_params=hnsw_params,
    index_name="embedding_idx"
)

# Create scalar index for filtered search
collection.create_index(field_name="category", index_name="category_idx")
collection.create_index(field_name="year", index_name="year_idx")

print("Indexes created!")
collection.load()  # Load into memory for searching
```

***

## Step 7 — Insert Data

```python
from pymilvus import Collection
from sentence_transformers import SentenceTransformer
import tqdm

collection = Collection("documents")
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

# Your documents
documents = [
    {
        "text": "Milvus is an open-source vector database for scalable AI applications.",
        "source": "documentation",
        "category": "database",
        "year": 2024
    },
    {
        "text": "HNSW provides fast approximate nearest neighbor search with high recall.",
        "source": "research",
        "category": "algorithm",
        "year": 2023
    },
    {
        "text": "GPU-accelerated indexing dramatically reduces build time for large vector collections.",
        "source": "blog",
        "category": "performance",
        "year": 2024
    },
    # Add thousands more documents here
]

def insert_batch(docs: list, batch_size: int = 1000):
    texts = [d["text"] for d in docs]
    
    # GPU-accelerated embedding
    embeddings = model.encode(
        texts,
        batch_size=256,
        show_progress_bar=False,
        normalize_embeddings=True
    )
    
    # Insert into Milvus
    data = {
        "text": [d["text"] for d in docs],
        "source": [d["source"] for d in docs],
        "category": [d["category"] for d in docs],
        "year": [d["year"] for d in docs],
        "embedding": embeddings.tolist()
    }
    
    result = collection.insert(data)
    return result.insert_count

# Insert in batches
BATCH_SIZE = 1000
total_inserted = 0

for i in range(0, len(documents), BATCH_SIZE):
    batch = documents[i:i + BATCH_SIZE]
    count = insert_batch(batch)
    total_inserted += count
    print(f"Inserted {total_inserted}/{len(documents)} documents")

# Flush to ensure data is persisted and indexed
collection.flush()
print(f"Total inserted and flushed: {total_inserted}")
```

***

## Step 8 — Search and Query

### Basic Semantic Search

```python
from pymilvus import Collection
from sentence_transformers import SentenceTransformer

collection = Collection("documents")
collection.load()

model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

def search(query: str, top_k: int = 10):
    query_embedding = model.encode(
        [query],
        normalize_embeddings=True
    )[0].tolist()
    
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param={
            "metric_type": "COSINE",
            "params": {"ef": 64}    # HNSW search-time parameter (ef >= top_k)
        },
        limit=top_k,
        output_fields=["text", "source", "category", "year"]
    )
    
    return results[0]

# Search
hits = search("how does vector similarity search work")
for hit in hits:
    print(f"Score: {hit.score:.4f}")
    print(f"Text: {hit.entity.get('text')[:100]}")
    print(f"Source: {hit.entity.get('source')}")
    print()
```

### Filtered Search

```python
from pymilvus import Collection

collection = Collection("documents")

# Search with metadata filter (boolean expression)
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=10,
    expr='category == "database" and year >= 2023',  # Boolean filter
    output_fields=["text", "category", "year"]
)
```

### Hybrid Search (Dense + Sparse)

```python
# Milvus 2.4+ supports hybrid dense+sparse search
from pymilvus import AnnSearchRequest, WeightedRanker, Collection

collection = Collection("documents")

# Dense search request
dense_req = AnnSearchRequest(
    data=[dense_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=20
)

# Sparse search request (requires sparse vector field)
sparse_req = AnnSearchRequest(
    data=[sparse_embedding],
    anns_field="sparse_embedding",
    param={"metric_type": "IP"},
    limit=20
)

# Combine with Reciprocal Rank Fusion
results = collection.hybrid_search(
    [dense_req, sparse_req],
    rerank=WeightedRanker(0.7, 0.3),  # 70% dense, 30% sparse
    limit=10,
    output_fields=["text"]
)
```

***

## Step 9 — Build a RAG Service

```bash
pip install fastapi uvicorn openai

cat > /workspace/milvus_rag.py << 'EOF'
from fastapi import FastAPI
from pydantic import BaseModel
from pymilvus import Collection, connections
from sentence_transformers import SentenceTransformer
from openai import OpenAI
import os

app = FastAPI(title="Milvus RAG API")

# Initialize at startup
connections.connect("default", host="localhost", port="19530")
collection = Collection("documents")
collection.load()
embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
llm = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class QueryRequest(BaseModel):
    question: str
    n_results: int = 5

@app.get("/health")
async def health():
    return {"status": "ok", "vectors": collection.num_entities}

@app.post("/search")
async def semantic_search(req: QueryRequest):
    embedding = embedder.encode(
        [req.question],
        normalize_embeddings=True
    )[0].tolist()
    
    results = collection.search(
        data=[embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE", "params": {"ef": 64}},
        limit=req.n_results,
        output_fields=["text", "source", "category"]
    )
    
    return {
        "results": [
            {
                "text": hit.entity.get("text"),
                "source": hit.entity.get("source"),
                "score": hit.score
            }
            for hit in results[0]
        ]
    }

@app.post("/rag")
async def rag(req: QueryRequest):
    embedding = embedder.encode([req.question], normalize_embeddings=True)[0].tolist()
    
    hits = collection.search(
        data=[embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE", "params": {"ef": 64}},
        limit=req.n_results,
        output_fields=["text", "source"]
    )[0]
    
    context = "\n\n".join([
        f"[{hit.entity.get('source')}]: {hit.entity.get('text')}"
        for hit in hits if hit.score > 0.4
    ])
    
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer based on context. Be concise."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {req.question}"}
        ]
    )
    
    return {"answer": response.choices[0].message.content, "context_used": len(hits)}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
EOF

python3 /workspace/milvus_rag.py
```

***

## Step 10 — Monitor and Manage

```python
from pymilvus import connections, utility, Collection

connections.connect("default", host="localhost", port="19530")

# List all collections
print("Collections:", utility.list_collections())

# Collection statistics
col = Collection("documents")
print(f"Entity count: {col.num_entities:,}")
print(f"Schema: {col.schema}")

# Partition management
col.create_partition("2024_docs")
col.create_partition("2023_docs")

# Insert with partition
col.insert(data, partition_name="2024_docs")

# Search specific partition
results = col.search(
    data=[query_vec],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=10,
    partition_names=["2024_docs"]  # Only search this partition
)
```

***

## Troubleshooting

### Services Not Starting

```bash
# Check container logs
docker compose logs etcd
docker compose logs minio
docker compose logs standalone

# Check disk space
df -h /opt/milvus

# Restart services
docker compose restart
```

### Connection Refused on 19530

```bash
# Verify Milvus is listening
netstat -tlnp | grep 19530

# Check health
curl http://localhost:9091/healthz

# Allow time for startup (90 seconds)
docker compose logs standalone | tail -20
```

### Index Build Timeout for Large Collections

```python
# Increase timeout for large index builds
from pymilvus import Collection

collection = Collection("documents")
collection.create_index(
    field_name="embedding",
    index_params=hnsw_params,
    timeout=3600  # 1 hour timeout
)
```

### High Memory Usage

```bash
# Configure Milvus memory limits in docker-compose.yml
# Add to standalone service:
deploy:
  resources:
    limits:
      memory: 16g
```

***

## Index Type Selection Guide

| Index Type     | Best For                  | Memory     | Speed     | GPU Required |
| -------------- | ------------------------- | ---------- | --------- | ------------ |
| FLAT           | Small (<1M), exact search | High       | Slow      | No           |
| IVF\_FLAT      | Medium (1M–10M)           | Medium     | Good      | No           |
| HNSW           | Low latency, <100M        | High       | Excellent | No           |
| IVF\_SQ8       | Compressed, large         | Low        | Good      | No           |
| GPU\_IVF\_FLAT | Fast batch queries        | GPU+RAM    | Best      | Yes          |
| DISKANN        | Billion-scale             | Low (disk) | Good      | No           |

***

## Performance Benchmarks

| Collection Size | Index          | GPU      | QPS      |
| --------------- | -------------- | -------- | -------- |
| 1M vectors      | HNSW           | RTX 3090 | \~8,000  |
| 10M vectors     | IVF\_FLAT      | RTX 4090 | \~2,500  |
| 10M vectors     | GPU\_IVF\_FLAT | A100     | \~12,000 |
| 100M vectors    | DISKANN        | A100     | \~1,200  |

***

## Additional Resources

* [Milvus Documentation](https://milvus.io/docs)
* [Milvus GitHub](https://github.com/milvus-io/milvus)
* [PyMilvus Documentation](https://milvus.io/api-reference/pymilvus/v2.4.x/About.md)
* [Milvus Bootcamp](https://github.com/milvus-io/bootcamp) — Example applications
* [Zilliz Cloud](https://cloud.zilliz.com/) — Managed Milvus
* [Vector Database Comparison](https://milvus.io/docs/benchmark.md)
* [Attu GUI](https://github.com/zilliztech/attu) — Web UI for Milvus management

***

*Milvus on Clore.ai is the ideal solution for AI applications that need to scale beyond hundreds of millions of vectors. Combined with GPU-accelerated embedding generation, you can build world-class semantic search and RAG systems at a fraction of managed cloud costs.*

***

## Clore.ai GPU Recommendations

| Use Case                  | Recommended GPU | Est. Cost on Clore.ai |
| ------------------------- | --------------- | --------------------- |
| Development/Testing       | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| Production Vector Search  | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| High-throughput Embedding | RTX 4090 (24GB) | \~$0.70/gpu/hr        |

> 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.
