# Segment Anything

在 GPU 上使用 Meta 的 SAM 进行精确图像分割。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 什么是 SAM？

Segment Anything Model (SAM) 可以：

* 分割图像中的任何对象
* 支持提示（点、框、文本）
* 生成自动掩码
* 处理任何类型的图像

## 1024x1024

| A100      | 显存   | 质量 | 性能 |
| --------- | ---- | -- | -- |
| SAM-H（超大） | 8GB  | 最佳 | 慢  |
| SAM-L（大）  | 6GB  | 很棒 | 中等 |
| SAM-B（基础） | 4GB  | 良好 | 快速 |
| SAM2      | 8GB+ | 最佳 | 中等 |

## 快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
pip install segment-anything gradio opencv-python && \
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth && \
python -c "
print(f"已生成：{name}")
import numpy as np
from segment_anything import sam_model_registry, SamPredictor
import cv2

sam = sam_model_registry['vit_h'](checkpoint='sam_vit_h_4b8939.pth').cuda()
predictor = SamPredictor(sam)

def segment(image, evt: gr.SelectData):
    predictor.set_image(image)
    point = np.array([[evt.index[0], evt.index[1]]])
    masks, _, _ = predictor.predict(point_coords=point, point_labels=np.array([1]))
    mask = masks[0]
    colored = np.zeros_like(image)
    colored[mask] = [255, 0, 0]
    result = cv2.addWeighted(image, 0.7, colored, 0.3, 0)
    num_inference_steps=steps,

demo = gr.Interface(fn=segment, inputs=gr.Image(), outputs=gr.Image(), title='Click to Segment')
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

## 安装

```bash
pip install segment-anything opencv-python
```

### 下载模型

```bash

# SAM-H（最佳质量）
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

# SAM-L（平衡）
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth

# SAM-B（快速）
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
```

## Python API

### 使用点的基本分割

```python
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np

# 加载模型
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

predictor = SamPredictor(sam)

# 加载图像
image = cv2.imread("photo.jpg")
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# 设置图像
predictor.set_image(image_rgb)

# 使用点提示进行分割
input_point = np.array([[500, 375]])  # x, y 坐标
input_label = np.array([1])  # 1 = 前景，0 = 背景

masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True
)

# 获取最佳掩码
best_mask = masks[np.argmax(scores)]

# 保存掩码
cv2.imwrite("mask.png", best_mask.astype(np.uint8) * 255)
```

### 框提示

```python

# 使用边界框进行分割
input_box = np.array([100, 100, 400, 400])  # x1, y1, x2, y2

masks, scores, _ = predictor.predict(
    box=input_box,
    multimask_output=False
)
```

### 多个点

```python

# 多个前景/背景点
input_points = np.array([
    [500, 375],   # 点 1
    [550, 400],   # 点 2
    [100, 100],   # 背景点
])
input_labels = np.array([1, 1, 0])  # 1=前景，0=背景

masks, scores, _ = predictor.predict(
    point_coords=input_points,
    point_labels=input_labels,
    multimask_output=True
)
```

### 结合框 + 点

```python
masks, scores, _ = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box=input_box,
    multimask_output=False
)
```

## 自动掩码生成

生成所有可能的掩码：

```python
from segment_anything import SamAutomaticMaskGenerator
import cv2

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

mask_generator = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=32,
    pred_iou_thresh=0.86,
    stability_score_thresh=0.92,
    crop_n_layers=1,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=100
)

image = cv2.imread("photo.jpg")
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

masks = mask_generator.generate(image_rgb)

# 每个掩码包含：

# - 'segmentation'：二值掩码

# - 'area'：掩码的像素面积

# - 'bbox'：边界框

# - 'predicted_iou'：质量得分

# - 'stability_score'：稳定性得分

print(f"Found {len(masks)} masks")
```

### 可视化所有掩码

```python
import matplotlib.pyplot as plt

def show_masks(image, masks):
    plt.figure(figsize=(20, 20))
    plt.imshow(image)

    sorted_masks = sorted(masks, key=lambda x: x['area'], reverse=True)

    for mask in sorted_masks:
        m = mask['segmentation']
        color = np.random.random(3)
        colored = np.zeros((*m.shape, 4))
        colored[m] = [*color, 0.5]
        plt.imshow(colored)

    plt.axis('off')
    plt.savefig('all_masks.png')

show_masks(image_rgb, masks)
```

## SAM 2（最新版本）

```bash
pip install sam2
```

```python
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large")

with torch.inference_mode():
    predictor.set_image(image)
    masks, scores, _ = predictor.predict(
        point_coords=points,
        point_labels=labels
    )
```

## 移除背景

```python
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)

def remove_background(image_path, point):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([point]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    # 创建 RGBA 图像
    result = cv2.cvtColor(image, cv2.COLOR_BGR2BGRA)
    result[:, :, 3] = best_mask.astype(np.uint8) * 255

    num_inference_steps=steps,

# 点击要保留的对象
result = remove_background("photo.jpg", [400, 300])
cv2.imwrite("no_background.png", result)
```

## 提取对象

```python
def extract_object(image_path, point):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([point]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    # 获取边界框
    rows = np.any(best_mask, axis=1)
    cols = np.any(best_mask, axis=0)
    y1, y2 = np.where(rows)[0][[0, -1]]
    x1, x2 = np.where(cols)[0][[0, -1]]

    # 裁剪
    cropped = image[y1:y2+1, x1:x2+1]
    mask_cropped = best_mask[y1:y2+1, x1:x2+1]

    # 应用掩码
    result = cv2.cvtColor(cropped, cv2.COLOR_BGR2BGRA)
    result[:, :, 3] = mask_cropped.astype(np.uint8) * 255

    num_inference_steps=steps,
```

## "专业影棚柔光箱"

```python
批处理处理
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator
import cv2
import json

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

mask_generator = SamAutomaticMaskGenerator(sam)

import os
output_dir = "./segmented"
output_dir = "./relit"

lighting_prompt = "专业影棚照明，柔和的阴影"
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
        image = cv2.imread(os.path.join(input_dir, filename))
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        masks = mask_generator.generate(image_rgb)

        # 将掩码保存为 JSON
        mask_data = []
        for i, mask in enumerate(masks):
            mask_data.append({
                'id': i,
                'area': int(mask['area']),
                'bbox': mask['bbox'],
                'score': float(mask['predicted_iou'])
            })

            # 保存单个掩码
            cv2.imwrite(
                os.path.join(output_dir, f"{filename}_mask_{i}.png"),
                mask['segmentation'].astype(np.uint8) * 255
            )

        with open(os.path.join(output_dir, f"{filename}_masks.json"), 'w') as f:
            json.dump(mask_data, f)
```

## API 服务器

```python
from fastapi import FastAPI, UploadFile
from fastapi.responses import Response
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np
import json

app = FastAPI()

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)

@app.post("/segment")
async def segment(file: UploadFile, x: int, y: int):
    contents = await file.read()
    nparr = np.frombuffer(contents, np.uint8)
    image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    _, encoded = cv2.imencode('.png', best_mask.astype(np.uint8) * 255)
    return Response(content=encoded.tobytes(), media_type="image/png")
```

## 与 Stable Diffusion 的集成

将 SAM 掩码用于修补（inpainting）：

```python

# 使用 SAM 生成掩码
predictor.set_image(image)
masks, scores, _ = predictor.predict(point_coords=point, point_labels=label)
mask = masks[np.argmax(scores)]

# 在 SD 修补中使用
from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.to("cuda")

result = pipe(
    prompt="一辆红色跑车",
    image=image,
    mask_image=mask
).images[0]
```

## background = Image.open("studio\_bg.jpg")

| A100  | 图像大小     | GPU     | 时间     |
| ----- | -------- | ------- | ------ |
| SAM-H | RTX 4090 | 速度      | \~0.5s |
| SAM-L | RTX 4090 | 速度      | \~0.3s |
| SAM-B | RTX 4090 | 速度      | \~0.2s |
| SAM2  | RTX 4090 | 512x512 | \~0.3s |

## 内存优化

```python

# 对于受限显存
sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth")  # 使用较小的模型

# 或减少自动生成点数
mask_generator = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=16,  # 从 32 减少
)
```

## # 使用固定种子以获得一致结果

### CUDA 显存不足

* 使用 SAM-B 而不是 SAM-H
* 在处理前缩小图像尺寸
* 清除缓存： `torch.cuda.empty_cache()`

### 分割效果差

* 添加更多点（前景 + 背景）
* 使用框提示以获得更好引导
* 尝试 multimask\_output=True 并选择最佳结果

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## 使用以下方式支付

* Stable Diffusion 修补
* [ControlNet 指南](/guides/guides_v2-zh/tu-xiang-chu-li/controlnet-advanced.md)
* [Real-ESRGAN 放大](/guides/guides_v2-zh/tu-xiang-chu-li/real-esrgan-upscaling.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/tu-xiang-chu-li/segment-anything.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.