# Segment Anything

Используйте SAM от Meta для точной сегментации изображений на GPU.

{% hint style="success" %}
Все примеры можно запускать на GPU-серверах, арендуемых через [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Аренда на CLORE.AI

1. Посетите [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Отфильтруйте по типу GPU, объему VRAM и цене
3. Выберите **On-Demand** (фиксированная ставка) или **Spot** (цена по ставке)
4. Настройте ваш заказ:
   * Выберите Docker-образ
   * Установите порты (TCP для SSH, HTTP для веб-интерфейсов)
   * Добавьте переменные окружения при необходимости
   * Введите команду запуска
5. Выберите способ оплаты: **CLORE**, **BTC**, или **USDT/USDC**
6. Создайте заказ и дождитесь развертывания

### Доступ к вашему серверу

* Найдите данные для подключения в **Моих заказах**
* Веб-интерфейсы: используйте URL HTTP-порта
* SSH: `ssh -p <port> root@<proxy-address>`

## Что такое SAM?

Модель Segment Anything (SAM) может:

* Сегментировать любые объекты на изображениях
* Работать с подсказками (точки, рамки, текст)
* Генерировать автоматические маски
* Обрабатывать любые типы изображений

## Варианты моделей

| Модель           | VRAM  | Качество | Скорость |
| ---------------- | ----- | -------- | -------- |
| SAM-H (огромный) | 8GB   | Лучшее   | Медленно |
| SAM-L (большой)  | 6 ГБ  | Отлично  | Средне   |
| SAM-B (базовый)  | 4 ГБ  | Хорошо   | Быстро   |
| SAM2             | 8 ГБ+ | Лучшее   | Средне   |

## Быстрое развертывание

**Docker-образ:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Порты:**

```
22/tcp
7860/http
```

**Команда:**

```bash
pip install segment-anything gradio opencv-python && \
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth && \
python -c "
import gradio as gr
import numpy as np
from segment_anything import sam_model_registry, SamPredictor
import cv2

sam = sam_model_registry['vit_h'](checkpoint='sam_vit_h_4b8939.pth').cuda()
predictor = SamPredictor(sam)

def segment(image, evt: gr.SelectData):
    predictor.set_image(image)
    point = np.array([[evt.index[0], evt.index[1]]])
    masks, _, _ = predictor.predict(point_coords=point, point_labels=np.array([1]))
    mask = masks[0]
    colored = np.zeros_like(image)
    colored[mask] = [255, 0, 0]
    result = cv2.addWeighted(image, 0.7, colored, 0.3, 0)
    return result

demo = gr.Interface(fn=segment, inputs=gr.Image(), outputs=gr.Image(), title='Click to Segment')
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Доступ к вашему сервису

После развертывания найдите ваш `http_pub` URL в **Моих заказах**:

1. Перейдите на **Моих заказах** страницу
2. Нажмите на ваш заказ
3. Найдите `http_pub` URL (например, `abc123.clorecloud.net`)

Используйте `https://YOUR_HTTP_PUB_URL` вместо `localhost` в примерах ниже.

## Установка

```bash
pip install segment-anything opencv-python
```

### Загрузка моделей

```bash

# SAM-H (лучшее качество)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

# SAM-L (сбалансированный)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth

# SAM-B (быстрый)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
```

## Python API

### Базовая сегментация с точками

```python
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np

# Загрузить модель
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

predictor = SamPredictor(sam)

# Загрузить изображение
image = cv2.imread("photo.jpg")
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Установить изображение
predictor.set_image(image_rgb)

# Сегментировать с подсказкой в виде точки
input_point = np.array([[500, 375]])  # координаты x, y
input_label = np.array([1])  # 1 = передний план, 0 = фон

masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True
)

# Получить лучшую маску
best_mask = masks[np.argmax(scores)]

# Сохранить маску
cv2.imwrite("mask.png", best_mask.astype(np.uint8) * 255)
```

### Подсказка с рамкой

```python

# Сегментировать с ограничивающей рамкой
input_box = np.array([100, 100, 400, 400])  # x1, y1, x2, y2

masks, scores, _ = predictor.predict(
    box=input_box,
    multimask_output=False
)
```

### Несколько точек

```python

# Несколько точек переднего/заднего плана
input_points = np.array([
    [500, 375],   # Точка 1
    [550, 400],   # Точка 2
    [100, 100],   # Точка фона
])
input_labels = np.array([1, 1, 0])  # 1=передний план, 0=фон

masks, scores, _ = predictor.predict(
    point_coords=input_points,
    point_labels=input_labels,
    multimask_output=True
)
```

### Комбинированная рамка + точка

```python
masks, scores, _ = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box=input_box,
    multimask_output=False
)
```

## Автоматическая генерация масок

Сгенерировать все возможные маски:

```python
from segment_anything import SamAutomaticMaskGenerator
import cv2

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

mask_generator = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=32,
    pred_iou_thresh=0.86,
    stability_score_thresh=0.92,
    crop_n_layers=1,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=100
)

image = cv2.imread("photo.jpg")
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

masks = mask_generator.generate(image_rgb)

# Каждая маска содержит:

# - 'segmentation': бинарная маска

# - 'area': площадь маски в пикселях

# - 'bbox': ограничивающая рамка

# - 'predicted_iou': оценка качества

# - 'stability_score': показатель стабильности

print(f"Found {len(masks)} masks")
```

### Визуализация всех масок

```python
import matplotlib.pyplot as plt

def show_masks(image, masks):
    plt.figure(figsize=(20, 20))
    plt.imshow(image)

    sorted_masks = sorted(masks, key=lambda x: x['area'], reverse=True)

    for mask in sorted_masks:
        m = mask['segmentation']
        color = np.random.random(3)
        colored = np.zeros((*m.shape, 4))
        colored[m] = [*color, 0.5]
        plt.imshow(colored)

    plt.axis('off')
    plt.savefig('all_masks.png')

show_masks(image_rgb, masks)
```

## SAM 2 (последняя версия)

```bash
pip install sam2
```

```python
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large")

with torch.inference_mode():
    predictor.set_image(image)
    masks, scores, _ = predictor.predict(
        point_coords=points,
        point_labels=labels
    )
```

## Удаление фона

```python
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)

def remove_background(image_path, point):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([point]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    # Создать изображение RGBA
    result = cv2.cvtColor(image, cv2.COLOR_BGR2BGRA)
    result[:, :, 3] = best_mask.astype(np.uint8) * 255

    return result

# Кликните по объекту, который нужно оставить
result = remove_background("photo.jpg", [400, 300])
cv2.imwrite("no_background.png", result)
```

## Извлечение объекта

```python
def extract_object(image_path, point):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([point]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    # Получить ограничивающую рамку
    rows = np.any(best_mask, axis=1)
    cols = np.any(best_mask, axis=0)
    y1, y2 = np.where(rows)[0][[0, -1]]
    x1, x2 = np.where(cols)[0][[0, -1]]

    # Обрезка
    cropped = image[y1:y2+1, x1:x2+1]
    mask_cropped = best_mask[y1:y2+1, x1:x2+1]

    # Применить маску
    result = cv2.cvtColor(cropped, cv2.COLOR_BGR2BGRA)
    result[:, :, 3] = mask_cropped.astype(np.uint8) * 255

    return result
```

## Пакетная обработка

```python
import os
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator
import cv2
import json

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

mask_generator = SamAutomaticMaskGenerator(sam)

input_dir = "./images"
output_dir = "./segmented"
os.makedirs(output_dir, exist_ok=True)

for filename in os.listdir(input_dir):
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
        image = cv2.imread(os.path.join(input_dir, filename))
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        masks = mask_generator.generate(image_rgb)

        # Сохранить маски в формате JSON
        mask_data = []
        for i, mask in enumerate(masks):
            mask_data.append({
                'id': i,
                'area': int(mask['area']),
                'bbox': mask['bbox'],
                'score': float(mask['predicted_iou'])
            })

            # Сохранить отдельную маску
            cv2.imwrite(
                os.path.join(output_dir, f"{filename}_mask_{i}.png"),
                mask['segmentation'].astype(np.uint8) * 255
            )

        with open(os.path.join(output_dir, f"{filename}_masks.json"), 'w') as f:
            json.dump(mask_data, f)
```

## API-сервер

```python
from fastapi import FastAPI, UploadFile
from fastapi.responses import Response
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np
import json

app = FastAPI()

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)

@app.post("/segment")
async def segment(file: UploadFile, x: int, y: int):
    contents = await file.read()
    nparr = np.frombuffer(contents, np.uint8)
    image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    _, encoded = cv2.imencode('.png', best_mask.astype(np.uint8) * 255)
    return Response(content=encoded.tobytes(), media_type="image/png")
```

## Интеграция со Stable Diffusion

Используйте маски SAM для инпейнтинга:

```python

# Сгенерировать маску с помощью SAM
predictor.set_image(image)
masks, scores, _ = predictor.predict(point_coords=point, point_labels=label)
mask = masks[np.argmax(scores)]

# Использовать для инпейнтинга в SD
from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.to("cuda")

result = pipe(
    prompt="a red sports car",
    image=image,
    mask_image=mask
).images[0]
```

## Производительность

| Модель | Размер изображения | GPU      | Время   |
| ------ | ------------------ | -------- | ------- |
| SAM-H  | 1024x1024          | RTX 3090 | \~0.5 с |
| SAM-L  | 1024x1024          | RTX 3090 | \~0.3 с |
| SAM-B  | 1024x1024          | RTX 3090 | \~0.2s  |
| SAM2   | 1024x1024          | RTX 4090 | \~0.3 с |

## Оптимизация памяти

```python

# Для ограниченной видеопамяти
sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth")  # Используйте меньшую модель

# Или уменьшите количество точек для автоматической генерации
mask_generator = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=16,  # Уменьшить с 32
)
```

## Устранение неполадок

### Ошибка CUDA: недостаточно памяти

* Используйте SAM-B вместо SAM-H
* Уменьшите размер изображения перед обработкой
* Очистить кэш: `torch.cuda.empty_cache()`

### Плохая сегментация

* Добавьте больше точек (передний план + фон)
* Используйте подсказку-рамку для лучшего направления
* Попробуйте multimask\_output=True и выберите лучшую

## Оценка стоимости

Типичные ставки на маркетплейсе CLORE.AI (по состоянию на 2024):

| GPU       | Почасовая ставка | Дневная ставка | Сессия 4 часа |
| --------- | ---------------- | -------------- | ------------- |
| RTX 3060  | \~$0.03          | \~$0.70        | \~$0.12       |
| RTX 3090  | \~$0.06          | \~$1.50        | \~$0.25       |
| RTX 4090  | \~$0.10          | \~$2.30        | \~$0.40       |
| A100 40GB | \~$0.17          | \~$4.00        | \~$0.70       |
| A100 80GB | \~$0.25          | \~$6.00        | \~$1.00       |

*Цены варьируются в зависимости от провайдера и спроса. Проверьте* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *для текущих тарифов.*

**Экономьте деньги:**

* Используйте **Spot** рынок для гибких рабочих нагрузок (часто на 30–50% дешевле)
* Платите с помощью **CLORE** токенов
* Сравнивайте цены у разных провайдеров

## Дальнейшие шаги

* Инпейтинг в Stable Diffusion
* [ControlNet Руководство](/guides/guides_v2-ru/obrabotka-izobrazhenii/controlnet-advanced.md)
* [Апскейл с помощью Real-ESRGAN](/guides/guides_v2-ru/obrabotka-izobrazhenii/real-esrgan-upscaling.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-ru/obrabotka-izobrazhenii/segment-anything.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.