# Milvus

> **The most scalable open-source vector database for AI applications — built for billions of vectors**

Milvus is an open-source vector database purpose-built for scalable similarity search and AI applications. Originally created by Zilliz and donated to the LF AI & Data Foundation, Milvus powers production AI workloads at companies including NVIDIA, AT\&T, IBM, and Salesforce. It's the go-to choice when you need to scale to billions of vectors.

**GitHub:** [milvus-io/milvus](https://github.com/milvus-io/milvus) — 32K+ ⭐

***

## Milvus vs Qdrant — When to Choose Which

| Criteria             | Milvus                          | Qdrant               |
| -------------------- | ------------------------------- | -------------------- |
| Scale                | Billions of vectors             | Hundreds of millions |
| Architecture         | Distributed (multiple services) | Single binary        |
| Setup complexity     | Higher                          | Lower                |
| GPU index support    | ✅ Native GPU FAISS              | Limited              |
| Multi-tenancy        | ✅ Partitions + aliases          | Collection-based     |
| Streaming ingestion  | ✅ Kafka/Pulsar                  | Limited              |
| Hybrid search        | ✅ Dense + sparse                | ✅                    |
| Cloud-managed option | Zilliz Cloud                    | Qdrant Cloud         |

{% hint style="success" %}
**Choose Milvus when:** You need to scale to billions of vectors, require GPU-accelerated indexing (IVF\_FLAT\_GPU), or need enterprise features like multi-tenancy, streaming ingestion, and role-based access control.
{% endhint %}

***

## Milvus Architecture

Milvus in standalone mode (single server) includes:

* **milvus** — the main service (proxy, query, data, index coordinators)
* **etcd** — metadata storage and service discovery
* **MinIO** — object storage for segment data

In distributed mode (cluster), each component scales independently.

***

## Prerequisites

* Clore.ai account with GPU rental
* Docker Compose (usually pre-installed)
* Basic Python knowledge
* 16GB+ RAM (32GB recommended for production)

***

## Step 1 — Rent a GPU Server on Clore.ai

1. Go to [clore.ai](https://clore.ai) → **Marketplace**
2. **Recommended GPU:** RTX 4090 or A100 for GPU-accelerated indexing
3. **CPU alternative:** Any server with 32GB+ RAM for CPU-based indexing

**Minimum Requirements:**

* CPU: 8 cores
* RAM: 16GB (32GB recommended)
* Disk: 50GB SSD/NVMe
* GPU: Optional (required only for GPU index types)

{% hint style="info" %}
**GPU index types in Milvus** (IVF\_FLAT\_GPU, IVFSQ8\_GPU) require CUDA-capable GPUs and dramatically accelerate index building for large collections. If you plan to index 10M+ vectors frequently, GPU indexing pays for itself quickly.
{% endhint %}

***

## Step 2 — Deploy Milvus Standalone

**Docker Image:**

```
milvusdb/milvus:v2.4.0
```

Milvus standalone requires etcd and MinIO. Use Docker Compose for the easiest setup.

**Ports:**

```
22
19530
```

* **Port 19530:** Milvus SDK/gRPC port (primary)
* **Port 9091:** Milvus REST API and health check (internal)

**Environment Variables:**

```
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
```

***

## Step 3 — Set Up with Docker Compose

SSH into your Clore.ai server and create the compose file:

```bash
ssh root@<server-ip> -p <ssh-port>

# Install Docker Compose if not present
which docker-compose || pip install docker-compose
# Or use Docker plugin:
docker compose version

# Create project directory
mkdir -p /opt/milvus && cd /opt/milvus

# Download official Milvus standalone compose file
wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml \
    -O docker-compose.yml

# Review the compose file
cat docker-compose.yml
```

### Customize docker-compose.yml

```yaml
version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - /opt/milvus/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-13T19-46-17Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - /opt/milvus/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.0
    command: ["milvus", "run", "standalone"]
    security_opt:
      - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - /opt/milvus/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]  # Enable GPU access
```

### Start Milvus

```bash
cd /opt/milvus
docker compose up -d

# Wait for services to start (~60 seconds)
sleep 60

# Check all services are healthy
docker compose ps

# Check Milvus health
curl http://localhost:9091/healthz
# Expected: {"status":"ok"}

# View logs
docker compose logs -f standalone --tail 50
```

***

## Step 4 — Install Python Client

```bash
pip install pymilvus sentence-transformers numpy tqdm

# Verify connection
python3 << 'EOF'
from pymilvus import connections, utility

connections.connect("default", host="localhost", port="19530")
print(f"Milvus connected!")
print(f"Version: {utility.get_server_version()}")
EOF
```

***

## Step 5 — Create a Collection

In Milvus, a **collection** is similar to a database table. It has a schema with typed fields including vector fields.

```python
from pymilvus import (
    connections,
    FieldSchema,
    CollectionSchema,
    DataType,
    Collection,
    utility
)

# Connect
connections.connect("default", host="localhost", port="19530")

# Define schema
fields = [
    FieldSchema(
        name="id",
        dtype=DataType.INT64,
        is_primary=True,
        auto_id=True           # Auto-generate IDs
    ),
    FieldSchema(
        name="text",
        dtype=DataType.VARCHAR,
        max_length=2048        # Maximum text length
    ),
    FieldSchema(
        name="source",
        dtype=DataType.VARCHAR,
        max_length=256
    ),
    FieldSchema(
        name="category",
        dtype=DataType.VARCHAR,
        max_length=128
    ),
    FieldSchema(
        name="year",
        dtype=DataType.INT32
    ),
    FieldSchema(
        name="embedding",
        dtype=DataType.FLOAT_VECTOR,
        dim=384                # Dimension of your embedding model
    )
]

schema = CollectionSchema(
    fields=fields,
    description="Document embeddings for semantic search",
    enable_dynamic_field=True  # Allow adding fields not in schema
)

# Create collection
collection_name = "documents"
if utility.has_collection(collection_name):
    utility.drop_collection(collection_name)

collection = Collection(
    name=collection_name,
    schema=schema,
    using="default"
)
print(f"Collection '{collection_name}' created!")
```

***

## Step 6 — Create Index

Before loading data for search, create an appropriate index:

```python
from pymilvus import Collection

collection = Collection("documents")

# HNSW Index (best for most use cases, low latency)
hnsw_params = {
    "metric_type": "COSINE",     # COSINE, L2, or IP (Inner Product)
    "index_type": "HNSW",
    "params": {
        "M": 16,                 # HNSW graph connectivity (8-64)
        "efConstruction": 200    # Build-time search depth
    }
}

# IVF_FLAT Index (CPU, good for large collections)
ivf_params = {
    "metric_type": "COSINE",
    "index_type": "IVF_FLAT",
    "params": {
        "nlist": 1024            # Number of clusters (sqrt of data size is typical)
    }
}

# GPU_IVF_FLAT Index (requires CUDA GPU — fastest for batch queries)
gpu_ivf_params = {
    "metric_type": "L2",
    "index_type": "GPU_IVF_FLAT",
    "params": {
        "nlist": 1024,
        "cache_dataset_on_device": True
    }
}

# Create index on the embedding field
collection.create_index(
    field_name="embedding",
    index_params=hnsw_params,
    index_name="embedding_idx"
)

# Create scalar index for filtered search
collection.create_index(field_name="category", index_name="category_idx")
collection.create_index(field_name="year", index_name="year_idx")

print("Indexes created!")
collection.load()  # Load into memory for searching
```

***

## Step 7 — Insert Data

```python
from pymilvus import Collection
from sentence_transformers import SentenceTransformer
import tqdm

collection = Collection("documents")
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

# Your documents
documents = [
    {
        "text": "Milvus is an open-source vector database for scalable AI applications.",
        "source": "documentation",
        "category": "database",
        "year": 2024
    },
    {
        "text": "HNSW provides fast approximate nearest neighbor search with high recall.",
        "source": "research",
        "category": "algorithm",
        "year": 2023
    },
    {
        "text": "GPU-accelerated indexing dramatically reduces build time for large vector collections.",
        "source": "blog",
        "category": "performance",
        "year": 2024
    },
    # Add thousands more documents here
]

def insert_batch(docs: list, batch_size: int = 1000):
    texts = [d["text"] for d in docs]
    
    # GPU-accelerated embedding
    embeddings = model.encode(
        texts,
        batch_size=256,
        show_progress_bar=False,
        normalize_embeddings=True
    )
    
    # Insert into Milvus
    data = {
        "text": [d["text"] for d in docs],
        "source": [d["source"] for d in docs],
        "category": [d["category"] for d in docs],
        "year": [d["year"] for d in docs],
        "embedding": embeddings.tolist()
    }
    
    result = collection.insert(data)
    return result.insert_count

# Insert in batches
BATCH_SIZE = 1000
total_inserted = 0

for i in range(0, len(documents), BATCH_SIZE):
    batch = documents[i:i + BATCH_SIZE]
    count = insert_batch(batch)
    total_inserted += count
    print(f"Inserted {total_inserted}/{len(documents)} documents")

# Flush to ensure data is persisted and indexed
collection.flush()
print(f"Total inserted and flushed: {total_inserted}")
```

***

## Step 8 — Search and Query

### Basic Semantic Search

```python
from pymilvus import Collection
from sentence_transformers import SentenceTransformer

collection = Collection("documents")
collection.load()

model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

def search(query: str, top_k: int = 10):
    query_embedding = model.encode(
        [query],
        normalize_embeddings=True
    )[0].tolist()
    
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param={
            "metric_type": "COSINE",
            "params": {"ef": 64}    # HNSW search-time parameter (ef >= top_k)
        },
        limit=top_k,
        output_fields=["text", "source", "category", "year"]
    )
    
    return results[0]

# Search
hits = search("how does vector similarity search work")
for hit in hits:
    print(f"Score: {hit.score:.4f}")
    print(f"Text: {hit.entity.get('text')[:100]}")
    print(f"Source: {hit.entity.get('source')}")
    print()
```

### Filtered Search

```python
from pymilvus import Collection

collection = Collection("documents")

# Search with metadata filter (boolean expression)
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=10,
    expr='category == "database" and year >= 2023',  # Boolean filter
    output_fields=["text", "category", "year"]
)
```

### Hybrid Search (Dense + Sparse)

```python
# Milvus 2.4+ supports hybrid dense+sparse search
from pymilvus import AnnSearchRequest, WeightedRanker, Collection

collection = Collection("documents")

# Dense search request
dense_req = AnnSearchRequest(
    data=[dense_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=20
)

# Sparse search request (requires sparse vector field)
sparse_req = AnnSearchRequest(
    data=[sparse_embedding],
    anns_field="sparse_embedding",
    param={"metric_type": "IP"},
    limit=20
)

# Combine with Reciprocal Rank Fusion
results = collection.hybrid_search(
    [dense_req, sparse_req],
    rerank=WeightedRanker(0.7, 0.3),  # 70% dense, 30% sparse
    limit=10,
    output_fields=["text"]
)
```

***

## Step 9 — Build a RAG Service

```bash
pip install fastapi uvicorn openai

cat > /workspace/milvus_rag.py << 'EOF'
from fastapi import FastAPI
from pydantic import BaseModel
from pymilvus import Collection, connections
from sentence_transformers import SentenceTransformer
from openai import OpenAI
import os

app = FastAPI(title="Milvus RAG API")

# Initialize at startup
connections.connect("default", host="localhost", port="19530")
collection = Collection("documents")
collection.load()
embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
llm = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class QueryRequest(BaseModel):
    question: str
    n_results: int = 5

@app.get("/health")
async def health():
    return {"status": "ok", "vectors": collection.num_entities}

@app.post("/search")
async def semantic_search(req: QueryRequest):
    embedding = embedder.encode(
        [req.question],
        normalize_embeddings=True
    )[0].tolist()
    
    results = collection.search(
        data=[embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE", "params": {"ef": 64}},
        limit=req.n_results,
        output_fields=["text", "source", "category"]
    )
    
    return {
        "results": [
            {
                "text": hit.entity.get("text"),
                "source": hit.entity.get("source"),
                "score": hit.score
            }
            for hit in results[0]
        ]
    }

@app.post("/rag")
async def rag(req: QueryRequest):
    embedding = embedder.encode([req.question], normalize_embeddings=True)[0].tolist()
    
    hits = collection.search(
        data=[embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE", "params": {"ef": 64}},
        limit=req.n_results,
        output_fields=["text", "source"]
    )[0]
    
    context = "\n\n".join([
        f"[{hit.entity.get('source')}]: {hit.entity.get('text')}"
        for hit in hits if hit.score > 0.4
    ])
    
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer based on context. Be concise."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {req.question}"}
        ]
    )
    
    return {"answer": response.choices[0].message.content, "context_used": len(hits)}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
EOF

python3 /workspace/milvus_rag.py
```

***

## Step 10 — Monitor and Manage

```python
from pymilvus import connections, utility, Collection

connections.connect("default", host="localhost", port="19530")

# List all collections
print("Collections:", utility.list_collections())

# Collection statistics
col = Collection("documents")
print(f"Entity count: {col.num_entities:,}")
print(f"Schema: {col.schema}")

# Partition management
col.create_partition("2024_docs")
col.create_partition("2023_docs")

# Insert with partition
col.insert(data, partition_name="2024_docs")

# Search specific partition
results = col.search(
    data=[query_vec],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=10,
    partition_names=["2024_docs"]  # Only search this partition
)
```

***

## Troubleshooting

### Services Not Starting

```bash
# Check container logs
docker compose logs etcd
docker compose logs minio
docker compose logs standalone

# Check disk space
df -h /opt/milvus

# Restart services
docker compose restart
```

### Connection Refused on 19530

```bash
# Verify Milvus is listening
netstat -tlnp | grep 19530

# Check health
curl http://localhost:9091/healthz

# Allow time for startup (90 seconds)
docker compose logs standalone | tail -20
```

### Index Build Timeout for Large Collections

```python
# Increase timeout for large index builds
from pymilvus import Collection

collection = Collection("documents")
collection.create_index(
    field_name="embedding",
    index_params=hnsw_params,
    timeout=3600  # 1 hour timeout
)
```

### High Memory Usage

```bash
# Configure Milvus memory limits in docker-compose.yml
# Add to standalone service:
deploy:
  resources:
    limits:
      memory: 16g
```

***

## Index Type Selection Guide

| Index Type     | Best For                  | Memory     | Speed     | GPU Required |
| -------------- | ------------------------- | ---------- | --------- | ------------ |
| FLAT           | Small (<1M), exact search | High       | Slow      | No           |
| IVF\_FLAT      | Medium (1M–10M)           | Medium     | Good      | No           |
| HNSW           | Low latency, <100M        | High       | Excellent | No           |
| IVF\_SQ8       | Compressed, large         | Low        | Good      | No           |
| GPU\_IVF\_FLAT | Fast batch queries        | GPU+RAM    | Best      | Yes          |
| DISKANN        | Billion-scale             | Low (disk) | Good      | No           |

***

## Performance Benchmarks

| Collection Size | Index          | GPU      | QPS      |
| --------------- | -------------- | -------- | -------- |
| 1M vectors      | HNSW           | RTX 3090 | \~8,000  |
| 10M vectors     | IVF\_FLAT      | RTX 4090 | \~2,500  |
| 10M vectors     | GPU\_IVF\_FLAT | A100     | \~12,000 |
| 100M vectors    | DISKANN        | A100     | \~1,200  |

***

## Additional Resources

* [Milvus Documentation](https://milvus.io/docs)
* [Milvus GitHub](https://github.com/milvus-io/milvus)
* [PyMilvus Documentation](https://milvus.io/api-reference/pymilvus/v2.4.x/About.md)
* [Milvus Bootcamp](https://github.com/milvus-io/bootcamp) — Example applications
* [Zilliz Cloud](https://cloud.zilliz.com/) — Managed Milvus
* [Vector Database Comparison](https://milvus.io/docs/benchmark.md)
* [Attu GUI](https://github.com/zilliztech/attu) — Web UI for Milvus management

***

*Milvus on Clore.ai is the ideal solution for AI applications that need to scale beyond hundreds of millions of vectors. Combined with GPU-accelerated embedding generation, you can build world-class semantic search and RAG systems at a fraction of managed cloud costs.*

***

## Clore.ai GPU Recommendations

| Use Case                  | Recommended GPU | Est. Cost on Clore.ai |
| ------------------------- | --------------- | --------------------- |
| Development/Testing       | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| Production Vector Search  | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| High-throughput Embedding | RTX 4090 (24GB) | \~$0.70/gpu/hr        |

> 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/rag-and-vector-databases/milvus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
