> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-hi/vision-models/llava-vision-language.md).

# LLaVA

LLaVA के साथ छवियों के साथ चैट करें - ओपन-सोर्स GPT-4V विकल्प।

{% hint style="success" %}
सभी उदाहरण GPU सर्वरों पर चलाए जा सकते हैं जिन्हें द्वारा किराए पर लिया गया है [CLORE.AI मार्केटप्लेस](https://clore.ai/marketplace).
{% endhint %}

## CLORE.AI पर किराये पर लेना

1. पर जाएँ [CLORE.AI मार्केटप्लेस](https://clore.ai/marketplace)
2. GPU प्रकार, VRAM, और मूल्य के अनुसार फ़िल्टर करें
3. चुनें **ऑन-डिमांड** (निश्चित दर) या **स्पॉट** (बिड प्राइस)
4. अपना ऑर्डर कॉन्फ़िगर करें:
   * Docker इमेज चुनें
   * पोर्ट सेट करें (SSH के लिए TCP, वेब UI के लिए HTTP)
   * यदि आवश्यक हो तो एनवायरनमेंट वेरिएबल जोड़ें
   * स्टार्टअप कमांड दर्ज करें
5. भुगतान चुनें: **CLORE**, **BTC**, या **USDT/USDC**
6. ऑर्डर बनाएं और डिप्लॉयमेंट का इंतज़ार करें

### अपने सर्वर तक पहुँचें

* कनेक्शन विवरण में खोजें **मेरे ऑर्डर**
* वेब इंटरफेस: HTTP पोर्ट URL का उपयोग करें
* SSH: `ssh -p <port> root@<proxy-address>`

## LLaVA क्या है?

LLaVA (Large Language and Vision Assistant) कर सकता है:

* छवियों को समझना और वर्णन करना
* दृश्य सामग्री के बारे में प्रश्नों के उत्तर देना
* चार्ट, आरेख, स्क्रीनशॉट का विश्लेषण करना
* ओसीआर और दस्तावेज़ समझ

## मॉडल वेरिएंट

| मॉडल          | आकार  | VRAM   | गुणवत्ता  |
| ------------- | ----- | ------ | --------- |
| LLaVA-1.5-7B  | 7B    | 8GB    | अच्छा     |
| LLaVA-1.5-13B | 13B   | 16GB   | बेहतर     |
| LLaVA-1.6-34B | 34B   | 40GB   | सर्वोत्तम |
| LLaVA-NeXT    | 7-34B | 8-40GB | नवीनतम    |

## त्वरित तैनाती

**Docker इमेज:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**पोर्ट:**

```
22/tcp
8000/http
```

**कमांड:**

```bash
pip install llava torch transformers accelerate gradio && \
python -m llava.serve.cli --model-path liuhaotian/llava-v1.5-7b --load-4bit
```

## अपनी सेवा तक पहुँचना

डिप्लॉयमेंट के बाद, अपना खोजें `http_pub` URL में **मेरे ऑर्डर**:

1. जाएँ **मेरे ऑर्डर** पृष्ठ
2. अपने ऑर्डर पर क्लिक करें
3. खोजें `http_pub` URL (उदा., `abc123.clorecloud.net`)

उपयोग करें `https://YOUR_HTTP_PUB_URL` की बजाय `localhost` नीचे दिए उदाहरणों में।

## इंस्टॉलेशन

```bash
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
pip install flash-attn --no-build-isolation
```

## मूल उपयोग

### Python API

```python
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model
from PIL import Image

model_path = "liuhaotian/llava-v1.5-7b"
tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path=model_path,
    model_base=None,
    model_name=get_model_name_from_path(model_path)
)

# सरल निष्कर्षण
args = type('Args', (), {
    "model_path": model_path,
    "model_base": None,
    "model_name": get_model_name_from_path(model_path),
    "query": "इस छवि का विस्तार से वर्णन करें",
    "conv_mode": None,
    "image_file": "photo.jpg",
    "sep": ",",
    "temperature": 0.2,
    "top_p": None,
    "num_beams": 1,
    "max_new_tokens": 512
})()

output = eval_model(args)
print(output)
```

### Transformers का उपयोग करना

```python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto"
)

# छवि लोड करें
image = Image.open("photo.jpg")

# बातचीत बनाएँ
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "इस छवि में क्या दिख रहा है?"}
        ]
    }
]

prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(prompt, image, return_tensors="pt").to("cuda")

output = model.generate(**inputs, max_new_tokens=200)
response = processor.decode(output[0], skip_special_tokens=True)
print(response)
```

## Ollama एकीकरण (अनुशंसित)

CLORE.AI पर LLaVA चलाने का सबसे आसान तरीका:

```bash
# Ollama इंस्टॉल करें
curl -fsSL https://ollama.com/install.sh | sh

# LLaVA मॉडल खींचें
ollama pull llava:7b

# छवि के साथ चलाएँ (CLI)
ollama run llava:7b "Describe this image: /path/to/image.jpg"
```

### Ollama के माध्यम से LLaVA API

{% hint style="warning" %}
**महत्वपूर्ण:** LLaVA की विज़न विशेषताएँ केवल **के माध्यम से** /api/generate `एंडपॉइंट के साथ` images `पैरामीटर।` /api/chat `और OpenAI-समर्थक एंडपॉइंट LLaVA के साथ` छवियों का **समर्थन** नहीं
{% endhint %}

#### करते।

```bash
कार्य करने की विधि: /api/generate
# पहले छवि को base64 में एन्कोड करें

BASE64_IMAGE=$(base64 -i photo.jpg | tr -d '\n')
# विज़न अनुरोध भेजें
  curl https://your-http-pub.clorecloud.net/api/generate -d "{
  \"model\": \"llava:7b\",
  \"prompt\": \"आप इस छवि में क्या देख रहे हैं? विस्तृत में वर्णन करें।\",
  \"images\": [\"$BASE64_IMAGE\"],
}"
```

प्रतिक्रिया:

```json
{
  \"stream\": false
  "model": "llava:7b",
  "response": "यह छवि पर्वतों के ऊपर एक सुंदर सूर्यास्त दिखाती है...",
}
```

#### "done": true

```bash
काम नहीं कर रहा: /api/chat (विज़न के लिए null लौटाता है)
# विज़न प्रश्नों के लिए यह काम नहीं करता:
  \"stream\": false
  curl https://your-http-pub.clorecloud.net/api/chat -d '{
}'
"messages": [{"role": "user", "content": "describe", "images": ["..."]}]
```

### # छवि-संबंधित उत्तरों के लिए null लौटाता है

```python
import requests
import base64

Ollama के साथ Python
    def encode_image(image_path):
        with open(image_path, "rb") as f:

return base64.b64encode(f.read()).decode()
response = requests.post(
    # विज़न के लिए /api/generate का उपयोग करें (NOT /api/chat!)
    json={
        \"stream\": false
        "https://your-http-pub.clorecloud.net/api/generate",
        "prompt": "आप इस छवि में क्या देखते हैं?",
        "images": [encode_image("photo.jpg")],
    }
)

"stream": False
```

### print(response.json()\["response"])

```python
import requests
import base64
पूर्ण कार्यशील उदाहरण

import sys
    def analyze_image(ollama_url, image_path, question):

    """LLaVA का उपयोग करके Ollama के माध्यम से एक छवि का विश्लेषण करें"""
    def encode_image(image_path):
        # छवि एन्कोड करें

    image_base64 = base64.b64encode(f.read()).decode()
    response = requests.post(
        # विज़न के लिए /api/generate का उपयोग करें (एकमात्र कार्यशील एंडपॉइंट)
        json={
            \"stream\": false
            f"{ollama_url}/api/generate",
            "prompt": question,
            "images": [encode_image("photo.jpg")],
        }
    )

    "images": [image_base64],

# उपयोग
return response.json()["response"]
url = "https://your-http-pub.clorecloud.net"
print(result)
```

## उपयोग के मामले

### result = analyze\_image(url, "photo.jpg", "इस छवि का विस्तार से वर्णन करें")

```python
छवि वर्णन
```

### prompt = "इस छवि का विस्तार से वर्णन करें, जिसमें रंग, वस्तुएँ और वातावरण शामिल हों."

```python
ओसीआर / पाठ निष्कर्षण
```

### prompt = "इस छवि में दिखाई देने वाला सभी पाठ निकालें। इसे स्पष्ट रूप से स्वरूपित करें."

```python
चार्ट विश्लेषण
```

### prompt = "इस चार्ट का विश्लेषण करें। प्रमुख रुझान और अंतर्दृष्टियाँ क्या हैं?"

```python
स्क्रीनशॉट से कोड
```

### ऑब्जेक्ट डिटेक्शन

```python
prompt = "इस स्क्रीनशॉट में दिखाया गया कोड निकालें। केवल कोड प्रदान करें."
```

## Gradio इंटरफ़ेस

```python
import gradio as gr
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "इस छवि में दिखाई देने वाली सभी वस्तुओं को उनके अनुमानित स्थानों के साथ सूचीबद्ध करें."
    conversation = [
        {
            "role": "user",
            "content": [
                {"type": "image"},
                def analyze_image(image, question):
            ]
        }
    ]

    prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
    inputs = processor(prompt, image, return_tensors="pt").to("cuda")

    {"type": "text", "text": question}
    response = processor.decode(output[0], skip_special_tokens=True)

    output = model.generate(**inputs, max_new_tokens=500)
    return response.split("[/INST]")[-1].strip()

demo = gr.Interface(
    # सहायक के उत्तर को निकालें
    inputs=[
        fn=analyze_image,
        gr.Image(type="pil", label="Image"),
    ],
    gr.Textbox(label="Question", value="इस छवि का विस्तार से वर्णन करें")
    outputs=gr.Textbox(label="Response"),
)

title="LLaVA विज़न असिस्टेंट"
```

## API सर्वर

```python
from fastapi import FastAPI, UploadFile, File, Form
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import io

app = FastAPI()

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto"
)

demo.launch(server_name="0.0.0.0", server_port=8000)
@app.post("/analyze")
    image: UploadFile = File(...),
    async def analyze(
):
    question: str = Form(default="चित्र का वर्णन करें")

    conversation = [
        {
            "role": "user",
            "content": [
                {"type": "image"},
                def analyze_image(image, question):
            ]
        }
    ]

    prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
    img = Image.open(io.BytesIO(await image.read()))

    {"type": "text", "text": question}
    response = processor.decode(output[0], skip_special_tokens=True)

    inputs = processor(prompt, img, return_tensors="pt").to("cuda")

# चलाएँ: uvicorn server:app --host 0.0.0.0 --port 8000
```

## बैच प्रोसेसिंग

```python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
from PIL import Image
import os

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-mistral-7b-hf")
model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto"
)

return {"response": response.split("[/INST]")[-1].strip()}
    image = Image.open(image_path)

    conversation = [
        def analyze_image(image_path, question):
            {"type": "image"},
            def analyze_image(image, question):
        ]}
    ]

    prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
    inputs = processor(prompt, image, return_tensors="pt").to("cuda")

    {"role": "user", "content": [
    output = model.generate(**inputs, max_new_tokens=300)

return processor.decode(output[0], skip_special_tokens=True).split("[/INST]")[-1].strip()
# छवियों के फ़ोल्डर को प्रोसेस करें
image_folder = "./images"

results = []
    for filename in os.listdir(image_folder):
        if filename.endswith(('.jpg', '.png', '.jpeg')):
        path = os.path.join(image_folder, filename)
        description = analyze_image(path, "इस छवि का संक्षेप में वर्णन करें")
        results.append({"file": filename, "description": description})

# परिणाम सहेजें
import json
print(f"{filename}: {description[:100]}...")
    json.dump(results, f, indent=2)
```

## मेमोरी अनुकूलन

### 4-बिट क्वांटाइज़ेशन

```python
with open("descriptions.json", "w") as f:

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    from transformers import BitsAndBytesConfig
)

model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    quantization_config=quantization_config,
    device_map="auto"
)
```

### bnb\_4bit\_compute\_dtype=torch.float16

```python
model = LlavaNextForConditionalGeneration.from_pretrained(
    "llava-hf/llava-v1.6-mistral-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
    CPU ऑफलोड
)
```

## प्रदर्शन

| मॉडल                      | GPU      | टोकन/सेकंड |
| ------------------------- | -------- | ---------- |
| LLaVA-1.5-7B              | RTX 3090 | \~30       |
| LLaVA-1.5-7B              | RTX 4090 | \~45       |
| offload\_folder="offload" | RTX 4090 | \~40       |
| LLaVA-1.5-13B             | A100     | \~35       |

## समस्याओं का निवारण

### आउट ऑफ़ मेमोरी

```python

LLaVA-1.6-7B

# 4-बिट क्वांटाइज़ेशन का उपयोग करें

# या छोटे मॉडल का उपयोग करें (13B के बजाय 7B)
# या छोटे छवियों को प्रोसेस करें
```

### धीमा जनरेशन

* image = image.resize((336, 336))
* फ्लैश अटेंशन का उपयोग करें
* max\_new\_tokens कम करें

### खराब गुणवत्ता

* बड़ा मॉडल उपयोग करें
* क्वांटाइज़्ड मॉडल का उपयोग करें
* संदर्भ के साथ बेहतर प्रॉम्प्ट्स

## लागत अनुमान

सामान्य CLORE.AI मार्केटप्लेस दरें (2024 के अनुसार):

| GPU       | घंटात्मक दर | दैनिक दर | 4-घंटे सत्र |
| --------- | ----------- | -------- | ----------- |
| RTX 3060  | \~$0.03     | \~$0.70  | \~$0.12     |
| RTX 3090  | \~$0.06     | \~$1.50  | \~$0.25     |
| RTX 4090  | \~$0.10     | \~$2.30  | \~$0.40     |
| A100 40GB | \~$0.17     | \~$4.00  | \~$0.70     |
| A100 80GB | \~$0.25     | \~$6.00  | \~$1.00     |

*कीमतें प्रदाता और मांग के अनुसार बदलती हैं। जाँच करें* [*CLORE.AI मार्केटप्लेस*](https://clore.ai/marketplace) *वर्तमान दरों के लिए।*

**पैसे बचाएँ:**

* उपयोग करें **स्पॉट** लचीले वर्कलोड के लिए मार्केट (अक्सर 30-50% सस्ता)
* भुगतान करें **CLORE** टोकन के साथ
* विभिन्न प्रदाताओं के बीच कीमतों की तुलना करें

## अगले कदम

* उच्च-रिज़ॉल्यूशन छवियाँ
* Ollama LLMs - Ollama के साथ LLaVA चलाएँ
* vLLM Inference - प्रोडक्शन सर्विंग


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-hi/vision-models/llava-vision-language.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.