> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-hi/rag-and-vector-databases/milvus.md).

# Milvus

> **एआई अनुप्रयोगों के लिए सबसे अधिक स्केलेबल ओपन-सोर्स वेक्टर डेटाबेस — अरबों वेक्टर के लिए बनाया गया**

Milvus एक ओपन-सोर्स वेक्टर डेटाबेस है जो स्केलेबल समानता खोज और एआई अनुप्रयोगों के लिए विशेष रूप से बनाया गया है। मूल रूप से Zilliz द्वारा निर्मित और LF AI & Data Foundation को दान किया गया, Milvus NVIDIA, AT\&T, IBM और Salesforce सहित कंपनियों में उत्पादन एआई कार्यभार को शक्ति देता है। जब आपको अरबों वेक्टर पर स्केल करने की आवश्यकता हो तो यह पसंदीदा विकल्प है।

**GitHub:** [milvus-io/milvus](https://github.com/milvus-io/milvus) — 32K+ ⭐

***

## Milvus बनाम Qdrant — किसे कब चुनें

| मानदंड                | Qdrant                       | ChromaDB       |
| --------------------- | ---------------------------- | -------------- |
| स्केल                 | अरबों वेक्टर                 | सैकड़ों मिलियन |
| आर्किटेक्चर           | वितरित (कई सर्विसेस)         | सिंगल बाइनरी   |
| सेटअप जटिलता          | अधिक                         | कम             |
| GPU इंडेक्स समर्थन    | ✅ नेटिव GPU FAISS            | सीमित          |
| मल्टी-टेनेन्सी        | ✅ पार्टिशन + उपनाम (aliases) | कलेक्शन-आधारित |
| स्ट्रीमिंग इनजेन्शन   | ✅ Kafka/Pulsar               | सीमित          |
| हाइब्रिड खोज          | ✅ Dense + sparse             | ✅              |
| क्लाउड-मैनेज्ड विकल्प | Zilliz Cloud                 | Qdrant Cloud   |

{% hint style="success" %}
**Milvus तब चुनें जब:** आपको अरबों वेक्टर तक स्केल करना हो, GPU-एक्सेलेरेटेड इंडेक्सिंग (IVF\_FLAT\_GPU) की आवश्यकता हो, या मल्टी-टेनेन्सी, स्ट्रीमिंग इनजेन्शन और भूमिका-आधारित एक्सेस कंट्रोल जैसी एंटरप्राइज़ सुविधाओं की आवश्यकता हो।
{% endhint %}

***

## Milvus आर्किटेक्चर

स्टैंडअलोन मोड (एकल सर्वर) में Milvus में शामिल हैं:

* **milvus** — मुख्य सेवा (प्रॉक्सी, क्वेरी, डेटा, इंडेक्स समन्वयक)
* **etcd** — मेटाडेटा स्टोरेज और सर्विस डिस्कवरी
* **MinIO** — सेगमेंट डेटा के लिए ऑब्जेक्ट स्टोरेज

वितरित मोड (क्लस्टर) में, प्रत्येक कंपोनेंट स्वतंत्र रूप से स्केल करता है।

***

## पूर्व-आवश्यकताएँ

* GPU किराये के साथ Clore.ai खाता
* Docker Compose (आम तौर पर पूर्व-इंस्टॉल्ड)
* बुनियादी Python ज्ञान
* 16GB+ RAM (उत्पादन के लिए 32GB सिफारिश की जाती है)

***

## चरण 1 — Clore.ai पर GPU सर्वर किराए पर लें

1. जाएँ [clore.ai](https://clore.ai) → **मार्केटप्लेस**
2. **सिफारिश की गई GPU:** GPU-एक्सेलेरेटेड इंडेक्सिंग के लिए RTX 4090 या A100
3. **CPU विकल्प:** CPU-आधारित इंडेक्सिंग के लिए कोई भी सर्वर जिसमें 32GB+ RAM हो

**न्यूनतम आवश्यकताएँ:**

* CPU: 8 कोर
* RAM: 16GB (32GB सिफारिश की जाती है)
* डिस्क: 50GB SSD/NVMe
* GPU: वैकल्पिक (केवल GPU इंडेक्स प्रकारों के लिए आवश्यक)

{% hint style="info" %}
**Milvus में GPU इंडेक्स प्रकार** (IVF\_FLAT\_GPU, IVFSQ8\_GPU) CUDA-सक्षम GPU की आवश्यकता होते हैं और बड़े कलेक्शन के लिए इंडेक्स निर्माण को उल्लेखनीय रूप से तेज करते हैं। यदि आप अक्सर 10M+ वेक्टर इंडेक्स करने की योजना बना रहे हैं, तो GPU इंडेक्सिंग जल्दी ही स्वयं का निवेश वापस कर देती है।
{% endhint %}

***

## चरण 2 — Milvus स्टैंडअलोन तैनात करें

**Docker इमेज:**

```
milvusdb/milvus:v2.4.0
```

Milvus स्टैंडअलोन के लिए etcd और MinIO की आवश्यकता होती है। सबसे आसान सेटअप के लिए Docker Compose का उपयोग करें।

**पोर्ट्स:**

```
22
19530
```

* **पोर्ट 19530:** Milvus SDK/gRPC पोर्ट (प्राथमिक)
* **पोर्ट 9091:** Milvus REST API और हेल्थ चेक (आंतरिक)

**पर्यावरण चर:**

```
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
```

***

## चरण 3 — Docker Compose के साथ सेट अप करें

अपने Clore.ai सर्वर में SSH करें और compose फ़ाइल बनाएं:

```bash
ssh root@<server-ip> -p <ssh-port>

# यदि मौजूद न हो तो Docker Compose इंस्टॉल करें
which docker-compose || pip install docker-compose
# या Docker प्लगइन का उपयोग करें:
docker compose version

# प्रोजेक्ट डायरेक्टरी बनाएं
mkdir -p /opt/milvus && cd /opt/milvus

# आधिकारिक Milvus स्टैंडअलोन compose फ़ाइल डाउनलोड करें
wget https://github.com/milvus-io/milvus/releases/download/v2.4.0/milvus-standalone-docker-compose.yml \
    -O docker-compose.yml

# compose फ़ाइल की समीक्षा करें
cat docker-compose.yml
```

### docker-compose.yml को अनुकूलित करें

```yaml
version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - /opt/milvus/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-13T19-46-17Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - /opt/milvus/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.0
    command: ["milvus", "run", "standalone"]
    security_opt:
      - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - /opt/milvus/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]  # GPU एक्सेस सक्षम करें
```

### Milvus शुरू करें

```bash
cd /opt/milvus
docker compose up -d

# सेवाओं के शुरू होने का इंतज़ार करें (~60 सेकंड)
sleep 60

# जाँचें कि सभी सेवाएँ स्वस्थ हैं
docker compose ps

# Milvus स्वास्थ्य जाँचें
curl http://localhost:9091/healthz
# अपेक्षित: {"status":"ok"}

# लॉग देखें
docker compose logs -f standalone --tail 50
```

***

## चरण 4 — Python क्लाइंट इंस्टॉल करें

```bash
pip install pymilvus sentence-transformers numpy tqdm

# कनेक्शन सत्यापित करें
python3 << 'EOF'
from pymilvus import connections, utility

connections.connect("default", host="localhost", port="19530")
print(f"Milvus connected!")
print(f"Version: {utility.get_server_version()}")
EOF
```

***

## चरण 5 — एक कलेक्शन बनाएं

Milvus में, एक **कलेक्शन** एक डेटाबेस तालिका के समान होता है। इसमें वेटर फ़ील्ड्स सहित टाइप किए गए फ़ील्ड के साथ एक स्कीमा होता है।

```python
from pymilvus import (
    connections,
    FieldSchema,
    CollectionSchema,
    DataType,
    Collection,
    utility
)

# कनेक्ट करें
connections.connect("default", host="localhost", port="19530")

# स्कीमा परिभाषित करें
fields = [
    FieldSchema(
        name="id",
        dtype=DataType.INT64,
        is_primary=True,
        auto_id=True           # ऑटो-जनरेट ID
    ),
    FieldSchema(
        name="text",
        dtype=DataType.VARCHAR,
        max_length=2048        # अधिकतम टेक्स्ट लंबाई
    ),
    FieldSchema(
        name="source",
        dtype=DataType.VARCHAR,
        max_length=256
    ),
    FieldSchema(
        name="category",
        dtype=DataType.VARCHAR,
        max_length=128
    ),
    FieldSchema(
        name="year",
        dtype=DataType.INT32
    ),
    FieldSchema(
        name="embedding",
        dtype=DataType.FLOAT_VECTOR,
        dim=384                # आपके एम्बेडिंग मॉडल का डायमेंशन
    )
]

schema = CollectionSchema(
    fields=fields,
    description="Document embeddings for semantic search",
    enable_dynamic_field=True  # स्कीमा में न होने वाले फ़ील्ड जोड़ने की अनुमति दें
)

# कलेक्शन बनाएं
collection_name = "documents"
if utility.has_collection(collection_name):
    utility.drop_collection(collection_name)

collection = Collection(
    name=collection_name,
    schema=schema,
    using="default"
)
print(f"Collection '{collection_name}' created!")
```

***

## चरण 6 — इंडेक्स बनाएं

खोज के लिए डेटा लोड करने से पहले, उपयुक्त इंडेक्स बनाएं:

```python
from pymilvus import Collection

collection = Collection("documents")

# HNSW इंडेक्स (अधिकांश उपयोग मामलों के लिए सर्वश्रेष्ठ, कम लेटेंसी)
hnsw_params = {
    "metric_type": "COSINE",     # COSINE, L2, या IP (Inner Product)
    "index_type": "HNSW",
    "params": {
        "M": 16,                 # HNSW ग्राफ कनेक्टिविटी (8-64)
        "efConstruction": 200    # बिल्ड-टाइम सर्च गहराई
    }
}

# IVF_FLAT इंडेक्स (CPU, बड़े कलेक्शन के लिए अच्छा)
ivf_params = {
    "metric_type": "COSINE",
    "index_type": "IVF_FLAT",
    "params": {
        "nlist": 1024            # क्लस्टर्स की संख्या (डाटा साइज का sqrt सामान्यतः)
    }
}

# GPU_IVF_FLAT इंडेक्स (CUDA GPU की आवश्यकता — बैच प्रश्नों के लिए सबसे तेज)
gpu_ivf_params = {
    "metric_type": "L2",
    "index_type": "GPU_IVF_FLAT",
    "params": {
        "nlist": 1024,
        "cache_dataset_on_device": True
    }
}

# एम्बेडिंग फ़ील्ड पर इंडेक्स बनाएं
collection.create_index(
    field_name="embedding",
    index_params=hnsw_params,
    index_name="embedding_idx"
)

# फ़िल्टर किए गए खोज के लिए स्केलर इंडेक्स बनाएं
collection.create_index(field_name="category", index_name="category_idx")
collection.create_index(field_name="year", index_name="year_idx")

print("Indexes created!")
collection.load()  # खोज के लिए मेमोरी में लोड करें
```

***

## चरण 7 — डेटा डालें

```python
from pymilvus import Collection
from sentence_transformers import SentenceTransformer
import tqdm

collection = Collection("documents")
model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

# आपके दस्तावेज़
documents = [
    {
        "text": "Milvus स्केलेबल एआई अनुप्रयोगों के लिए एक ओपन-सोर्स वेक्टर डेटाबेस है.",
        "source": "documentation",
        "category": "database",
        "year": 2024
    },
    {
        "text": "HNSW उच्च रिकॉल के साथ तेज अनुमानित निकटतम पड़ोसी खोज प्रदान करता है.",
        "source": "research",
        "category": "algorithm",
        "year": 2023
    },
    {
        "text": "GPU-एक्सेलेरेटेड इंडेक्सिंग बड़े वेक्टर कलेक्शन के लिए बिल्ड समय को नाटकीय रूप से घटाती है.",
        "source": "blog",
        "category": "performance",
        "year": 2024
    },
    # यहाँ हजारों और दस्तावेज़ जोड़ें
]

def insert_batch(docs: list, batch_size: int = 1000):
    texts = [d["text"] for d in docs]
    
    # GPU-एक्सेलेरेटेड एम्बेडिंग
    embeddings = model.encode(
        texts,
        batch_size=256,
        show_progress_bar=False,
        normalize_embeddings=True
    )
    
    # Milvus में डालें
    data = {
        "text": [d["text"] for d in docs],
        "source": [d["source"] for d in docs],
        "category": [d["category"] for d in docs],
        "year": [d["year"] for d in docs],
        "embedding": embeddings.tolist()
    }
    
    result = collection.insert(data)
    return result.insert_count

# बैचों में डालें
BATCH_SIZE = 1000
total_inserted = 0

for i in range(0, len(documents), BATCH_SIZE):
    batch = documents[i:i + BATCH_SIZE]
    count = insert_batch(batch)
    total_inserted += count
    print(f"Inserted {total_inserted}/{len(documents)} documents")

# यह सुनिश्चित करने के लिए फ्लश करें कि डेटा स्थायी और इंडेक्स किया गया है
collection.flush()
print(f"Total inserted and flushed: {total_inserted}")
```

***

## चरण 8 — खोज और क्वेरी

### मूल सिमेंटिक खोज

```python
from pymilvus import Collection
from sentence_transformers import SentenceTransformer

collection = Collection("documents")
collection.load()

model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

def search(query: str, top_k: int = 10):
    query_embedding = model.encode(
        [query],
        normalize_embeddings=True
    )[0].tolist()
    
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param={
            "metric_type": "COSINE",
            "params": {"ef": 64}    # HNSW सर्च-टाइम पैरामीटर (ef >= top_k)
        },
        limit=top_k,
        output_fields=["text", "source", "category", "year"]
    )
    
    return results[0]

# खोज
hits = search("how does vector similarity search work")
for hit in hits:
    print(f"Score: {hit.score:.4f}")
    print(f"Text: {hit.entity.get('text')[:100]}")
    print(f"Source: {hit.entity.get('source')}")
    print()
```

### फ़िल्टर की गई खोज

```python
from pymilvus import Collection

collection = Collection("documents")

# मेटाडेटा फ़िल्टर (बूलियन अभिव्यक्ति) के साथ खोज
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=10,
    expr='category == "database" and year >= 2023',  # बूलियन फ़िल्टर
    output_fields=["text", "category", "year"]
)
```

### हाइब्रिड खोज (Dense + Sparse)

```python
# Milvus 2.4+ हाइब्रिड dense+sparse खोज का समर्थन करता है
from pymilvus import AnnSearchRequest, WeightedRanker, Collection

collection = Collection("documents")

# Dense खोज अनुरोध
dense_req = AnnSearchRequest(
    data=[dense_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=20
)

# Sparse खोज अनुरोध (sparse वेक्टर फ़ील्ड की आवश्यकता)
sparse_req = AnnSearchRequest(
    data=[sparse_embedding],
    anns_field="sparse_embedding",
    param={"metric_type": "IP"},
    limit=20
)

# Reciprocal Rank Fusion के साथ संयोजन करें
results = collection.hybrid_search(
    [dense_req, sparse_req],
    rerank=WeightedRanker(0.7, 0.3),  # 70% dense, 30% sparse
    limit=10,
    output_fields=["text"]
)
```

***

## चरण 9 — एक RAG सेवा बनाएं

```bash
pip install fastapi uvicorn openai

cat > /workspace/milvus_rag.py << 'EOF'
from fastapi import FastAPI
from pydantic import BaseModel
from pymilvus import Collection, connections
from sentence_transformers import SentenceTransformer
from openai import OpenAI
import os

app = FastAPI(title="Milvus RAG API")

# स्टार्टअप पर इनिशियलाइज़ करें
connections.connect("default", host="localhost", port="19530")
collection = Collection("documents")
collection.load()
embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
llm = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class QueryRequest(BaseModel):
    question: str
    n_results: int = 5

@app.get("/health")
async def health():
    return {"status": "ok", "vectors": collection.num_entities}

@app.post("/search")
async def semantic_search(req: QueryRequest):
    embedding = embedder.encode(
        [req.question],
        normalize_embeddings=True
    )[0].tolist()
    
    results = collection.search(
        data=[embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE", "params": {"ef": 64}},
        limit=req.n_results,
        output_fields=["text", "source", "category"]
    )
    
    return {
        "results": [
            {
                "text": hit.entity.get("text"),
                "source": hit.entity.get("source"),
                "score": hit.score
            }
            for hit in results[0]
        ]
    }

@app.post("/rag")
async def rag(req: QueryRequest):
    embedding = embedder.encode([req.question], normalize_embeddings=True)[0].tolist()
    
    hits = collection.search(
        data=[embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE", "params": {"ef": 64}},
        limit=req.n_results,
        output_fields=["text", "source"]
    )[0]
    
    context = "\n\n".join([
        f"[{hit.entity.get('source')}]: {hit.entity.get('text')}"
        for hit in hits if hit.score > 0.4
    ])
    
    response = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer based on context. Be concise."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {req.question}"}
        ]
    )
    
    return {"answer": response.choices[0].message.content, "context_used": len(hits)}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
EOF

python3 /workspace/milvus_rag.py
```

***

## चरण 10 — मॉनिटर और प्रबंधित करें

```python
from pymilvus import connections, utility, Collection

connections.connect("default", host="localhost", port="19530")

# सभी कलेक्शनों की सूची बनाएं
print("Collections:", utility.list_collections())

# कलेक्शन सांख्यिकी
col = Collection("documents")
print(f"Entity count: {col.num_entities:,}")
print(f"Schema: {col.schema}")

# पार्टिशन प्रबंधन
col.create_partition("2024_docs")
col.create_partition("2023_docs")

# पार्टिशन के साथ डालें
col.insert(data, partition_name="2024_docs")

# विशिष्ट पार्टिशन खोजें
results = col.search(
    data=[query_vec],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=10,
    partition_names=["2024_docs"]  # केवल इस पार्टिशन में खोजें
)
```

***

## समस्या निवारण

### सर्विसेज़ स्टार्ट नहीं हो रही हैं

```bash
# कंटेनर लॉग्स जाँचें
docker compose logs etcd
docker compose logs minio
docker compose logs standalone

# डिस्क स्पेस जाँचें
df -h /opt/milvus

# सेवाओं को रीस्टार्ट करें
docker compose restart
```

### 19530 पर कनेक्शन अस्वीकृत

```bash
# पुष्टि करें कि Milvus सुन रहा है
netstat -tlnp | grep 19530

# स्वास्थ्य जाँच करें
curl http://localhost:9091/healthz

# स्टार्टअप के लिए समय दें (90 सेकंड)
docker compose logs standalone | tail -20
```

### बड़े कलेक्शन के लिए इंडेक्स बिल्ड टाइमआउट

```python
# बड़े इंडेक्स बिल्ड्स के लिए टाइमआउट बढ़ाएँ
from pymilvus import Collection

collection = Collection("documents")
collection.create_index(
    field_name="embedding",
    index_params=hnsw_params,
    timeout=3600  # 1 घंटे का टाइमआउट
)
```

### उच्च मेमोरी उपयोग

```bash
# docker-compose.yml में Milvus मेमोरी सीमाएँ कॉन्फ़िगर करें
# standalone सर्विस में जोड़ें:
deploy:
  resources:
    limits:
      memory: 16g
```

***

## इंडेक्स प्रकार चयन मार्गदर्शिका

| इंडेक्स प्रकार | माध्य/दिन            | मेमोरी        | गति        | GPU आवश्यक |
| -------------- | -------------------- | ------------- | ---------- | ---------- |
| FLAT           | छोटा (<1M), सटीक खोज | उच्च          | धीमा       | नहीं       |
| IVF\_FLAT      | मध्यम (1M–10M)       | मध्यम         | अच्छा      | नहीं       |
| HNSW           | निम्न लेटेंसी, <100M | उच्च          | अत्युत्तम  | नहीं       |
| IVF\_SQ8       | कम्प्रेस्ड, बड़ा     | कम            | अच्छा      | नहीं       |
| GPU\_IVF\_FLAT | तेज़ बैच प्रश्न      | GPU+RAM       | सबसे अच्छा | हाँ        |
| DISKANN        | अरब-स्तरीय           | निम्न (डिस्क) | अच्छा      | नहीं       |

***

## प्रदर्शन बेंचमार्क्स

| कलेक्शन आकार | इंडेक्स        | GPU      | QPS      |
| ------------ | -------------- | -------- | -------- |
| 1M वेक्टर    | HNSW           | RTX 3090 | \~8,000  |
| 10M वेक्टर   | IVF\_FLAT      | RTX 4090 | \~2,500  |
| 10M वेक्टर   | GPU\_IVF\_FLAT | A100     | \~12,000 |
| 100M वेक्टर  | DISKANN        | A100     | \~1,200  |

***

## अतिरिक्त संसाधन

* [Milvus प्रलेखन](https://milvus.io/docs)
* [Milvus GitHub](https://github.com/milvus-io/milvus)
* [PyMilvus प्रलेखन](https://milvus.io/api-reference/pymilvus/v2.4.x/About.md)
* [Milvus बूटकैंप](https://github.com/milvus-io/bootcamp) — उदाहरण एप्लिकेशन
* [Zilliz Cloud](https://cloud.zilliz.com/) — प्रबंधित Milvus
* [वेक्टर डेटाबेस तुलना](https://milvus.io/docs/benchmark.md)
* [Attu GUI](https://github.com/zilliztech/attu) — Milvus प्रबंधन के लिए वेब UI

***

*Clore.ai पर Milvus उन एआई अनुप्रयोगों के लिए आदर्श समाधान है जिन्हें सैकड़ों मिलियन से आगे स्केल करने की आवश्यकता होती है। GPU-एक्सेलेरेटेड एम्बेडिंग जेनरेशन के साथ संयोजित करके, आप प्रबंधित क्लाउड लागत के एक हिस्से पर विश्व-स्तरीय सेमांटिक सर्च और RAG सिस्टम बना सकते हैं।*

***

## Clore.ai GPU सिफारिशें

| उपयोग केस             | सिफारिश की गई GPU | Clore.ai पर अनुमानित लागत |
| --------------------- | ----------------- | ------------------------- |
| डेवलपमेंट/टेस्टिंग    | RTX 3090 (24GB)   | \~$0.12/gpu/hr            |
| प्रोडक्शन वेक्टर सर्च | RTX 3090 (24GB)   | \~$0.12/gpu/hr            |
| हाई-थ्रूपुट एम्बेडिंग | RTX 4090 (24GB)   | \~$0.70/gpu/hr            |

> 💡 इस गाइड के सभी उदाहरण तैनात किए जा सकते हैं [Clore.ai](https://clore.ai/marketplace) GPU सर्वरों पर। उपलब्ध GPUs ब्राउज़ करें और घंटे के हिसाब से किराए पर लें — कोई प्रतिबद्धता नहीं, पूर्ण रूट एक्सेस।


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-hi/rag-and-vector-databases/milvus.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.