# SAM2 视频

使用 Meta 的 SAM2.1 在视频中跟踪和分割任何对象 —— 这是在视频准确性上改进的 SAM2 的增强版本。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

{% hint style="info" %}
本指南中的所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace) 市场。
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 什么是 SAM2？

Meta AI 的 SAM2（Segment Anything Model 2）可以实现：

* 实时视频对象分割
* 点击即可跟踪任意对象
* 通过遮挡保持一致的跟踪
* 内存高效的视频处理

## SAM2.1 的新特性

SAM2.1 相比原始 SAM2 带来了显著改进：

* **提高的视频准确性** — 在遮挡和快速运动情况下更好的跟踪
* **增强的记忆模块** — 更一致的长程跟踪
* **新的检查点** — `sam2.1_hiera_*` 系列具有更好性能
* **官方 pip 包** — 使用以下命令安装 `pip install sam-2` （无需手动构建）
* **更快的推理** — 优化的 CUDA 内核

## 资源

* **GitHub：** [facebookresearch/sam2](https://github.com/facebookresearch/sam2)
* **论文：** [SAM2 论文](https://arxiv.org/abs/2408.00714)
* **演示：** [SAM2 演示](https://sam2.metademolab.com/)
* **模型权重：** [SAM2.1 检查点](https://github.com/facebookresearch/sam2#model-checkpoints)

## 推荐硬件

| 组件  | 最低            | 推荐            | 最佳            |
| --- | ------------- | ------------- | ------------- |
| GPU | RTX 3060 12GB | RTX 4080 16GB | RTX 4090 24GB |
| 显存  | 8GB           | 16GB          | 24GB          |
| CPU | 4 核           | 8 核           | 16 核          |
| 内存  | 16GB          | 32GB          | 64GB          |
| 存储  | 30GB SSD      | 50GB NVMe     | 100GB NVMe    |
| 网络  | 100 Mbps      | 500 Mbps      | 1 Gbps        |

## 在 CLORE.AI 上快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
cd /workspace && \
pip install sam-2 && \
python -c "from sam2.build_sam import build_sam2; print('SAM2.1 ready!')"
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

## 安装

```bash
# 官方 pip 包（推荐用于 SAM2.1）
pip install sam-2

# 下载 SAM2.1 检查点
python -c "
from sam2.utils.misc import download_file_with_progress

checkpoints = [
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt', 'checkpoints/sam2.1_hiera_tiny.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt', 'checkpoints/sam2.1_hiera_small.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt', 'checkpoints/sam2.1_hiera_base_plus.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt', 'checkpoints/sam2.1_hiera_large.pt'),
]
"

# 或使用下载脚本
mkdir -p checkpoints && cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt
```

### 替代：从源码（用于开发）

```bash
git clone https://github.com/facebookresearch/sam2.git
cd sam2
pip install -e ".[demo]"

# 下载 SAM2.1 检查点
cd checkpoints
bash download_ckpts.sh
```

## 您可以创建的内容

### 视频编辑

* 从视频中移除对象
* 无缝替换背景
* 为合成创建视频掩码

### 体育分析

* 跟踪比赛中的球员
* 分析运动轨迹
* 生成精彩集锦

### 医学影像

* 对 CT/MRI 视频中的器官进行分割
* 在显微镜下跟踪细胞运动
* 测量随时间的生长

### 监控与安防

* 跨摄像头跟踪对象
* 计数人员/车辆
* 异常检测

### 创意项目

* VFX 的描绘抠像（Rotoscoping）
* 交互式视频装置
* AR/VR 内容创作

## 基本用法

### 图像分割

```python
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from PIL import Image
import numpy as np

# 加载 SAM2.1 模型（相比 SAM2 精度更高）
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"

sam2 = build_sam2(model_cfg, checkpoint, device="cuda")
predictor = SAM2ImagePredictor(sam2)

# 加载图像
image = np.array(Image.open("image.jpg"))
predictor.set_image(image)

# 使用点提示进行分割
point_coords = np.array([[500, 375]])  # x, y 坐标
point_labels = np.array([1])  # 1 = 前景

masks, scores, logits = predictor.predict(
    point_coords=point_coords,
    point_labels=point_labels,
    multimask_output=True
)

# 获取最佳掩码
best_mask = masks[scores.argmax()]
```

### 视频对象跟踪

```python
import torch
from sam2.build_sam import build_sam2_video_predictor
import numpy as np

# 初始化 SAM2.1 视频预测器（改进的跟踪精度）
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"

predictor = build_sam2_video_predictor(model_cfg, checkpoint, device="cuda")

# 用视频初始化
video_path = "./video_frames"  # 存放帧图像的目录
inference_state = predictor.init_state(video_path=video_path)

# 在第一帧添加点
predictor.reset_state(inference_state)
frame_idx = 0
obj_id = 1  # 用于跟踪的对象 ID

points = np.array([[400, 300]], dtype=np.float32)
labels = np.array([1], dtype=np.int32)

# 添加要跟踪的对象
_, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
    inference_state=inference_state,
    frame_idx=frame_idx,
    obj_id=obj_id,
    points=points,
    labels=labels
)

# 在视频中传播
video_segments = {}
for out_frame_idx, out_obj_ids, out_mask_logits in predictor.propagate_in_video(inference_state):
    video_segments[out_frame_idx] = {
        obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
        for i, obj_id in enumerate(out_obj_ids)
    }
```

## 多对象跟踪

```python
import torch
from sam2.build_sam import build_sam2_video_predictor
import numpy as np

predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)

video_path = "./video_frames"
inference_state = predictor.init_state(video_path=video_path)

# 跟踪多个对象
objects_to_track = [
    {"id": 1, "point": [200, 150], "frame": 0},  # 人员 1
    {"id": 2, "point": [400, 200], "frame": 0},  # 人员 2
    {"id": 3, "point": [600, 300], "frame": 0},  # 球
]

for obj in objects_to_track:
    predictor.add_new_points_or_box(
        inference_state=inference_state,
        frame_idx=obj["frame"],
        obj_id=obj["id"],
        points=np.array([obj["point"]], dtype=np.float32),
        labels=np.array([1], dtype=np.int32)
    )

# 传播所有对象
all_masks = {}
for frame_idx, obj_ids, mask_logits in predictor.propagate_in_video(inference_state):
    all_masks[frame_idx] = {}
    for i, obj_id in enumerate(obj_ids):
        all_masks[frame_idx][obj_id] = (mask_logits[i] > 0.0).cpu().numpy()
```

## 框提示分割

```python
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
import numpy as np
from PIL import Image

sam2 = build_sam2(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)
predictor = SAM2ImagePredictor(sam2)

image = np.array(Image.open("image.jpg"))
predictor.set_image(image)

# 使用边界框进行分割
box = np.array([100, 100, 400, 400])  # x1, y1, x2, y2

masks, scores, _ = predictor.predict(
    box=box,
    multimask_output=False
)
```

## Gradio 界面

```python
print(f"已生成：{name}")
import numpy as np
from PIL import Image
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor

sam2 = build_sam2(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)
predictor = SAM2ImagePredictor(sam2)

def segment_image(image, x, y):
    predictor.set_image(np.array(image))

    masks, scores, _ = predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[scores.argmax()]

    # 创建叠加层
    overlay = np.array(image).copy()
    overlay[best_mask] = overlay[best_mask] * 0.5 + np.array([255, 0, 0]) * 0.5

    return Image.fromarray(overlay.astype(np.uint8))

demo = gr.Interface(
    fn=segment_image,
    inputs=[
        fn=relight_image,
        gr.Number(label="X coordinate"),
        gr.Number(label="Y coordinate")
    ],
    outputs=gr.Image(label="Segmented Image"),
    title="SAM2 - Segment Anything",
    description="点击坐标以分割对象。在 CLORE.AI GPU 服务器上运行。"
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## 导出掩码为视频

```python
import cv2
import numpy as np
from sam2.build_sam import build_sam2_video_predictor

predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)

# ...（上方的跟踪代码）

# 导出为视频
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output_masks.mp4', fourcc, 30.0, (width, height))

for frame_idx in sorted(video_segments.keys()):
    frame = cv2.imread(f"./video_frames/{frame_idx:05d}.jpg")

    # 应用掩码叠加
    for obj_id, mask in video_segments[frame_idx].items():
        color = [0, 255, 0] if obj_id == 1 else [0, 0, 255]
        frame[mask.squeeze()] = frame[mask.squeeze()] * 0.5 + np.array(color) * 0.5

    out.write(frame.astype(np.uint8))

out.release()
```

## background = Image.open("studio\_bg.jpg")

| 任务     | 分辨率      | GPU     | 性能    |
| ------ | -------- | ------- | ----- |
| 图像分割   | RTX 4090 | 速度      | 50 毫秒 |
| 图像分割   | RTX 4090 | 512x512 | 30ms  |
| 视频（每帧） | 720p     | 512x512 | 45ms  |
| 视频（每帧） | 1080p    | 2s      | 35ms  |

## 模型变体（SAM2.1）

SAM2.1 引入了新的 `sam2.1_hiera_*` 在视频跟踪准确性上改进的检查点：

| A100                      | 参数量      | 显存       | 性能     | 质量     | 检查点                          |
| ------------------------- | -------- | -------- | ------ | ------ | ---------------------------- |
| sam2.1\_hiera\_tiny       | 38M      | 4GB      | 最快     | 良好     | sam2.1\_hiera\_tiny.pt       |
| sam2.1\_hiera\_small      | 46M      | 5GB      | 快速     | 更好     | sam2.1\_hiera\_small.pt      |
| sam2.1\_hiera\_base\_plus | 80M      | 8GB      | 中等     | 很棒     | sam2.1\_hiera\_base\_plus.pt |
| **sam2.1\_hiera\_large**  | **224M** | **12GB** | **较慢** | **最佳** | **sam2.1\_hiera\_large.pt**  |

> **注意：** 在视频基准测试中，SAM2.1 模型在快速移动对象和长时间遮挡场景下持续优于其 SAM2 对应模型。

## IC-Light-FBC

### 内存不足

**与背景合成** 长视频导致 CUDA 内存不足

**光照未改变**

```python

# 分块处理
chunk_size = 100  # 每块帧数

for start_frame in range(0, total_frames, chunk_size):
    end_frame = min(start_frame + chunk_size, total_frames)
    # 处理该块...
    torch.cuda.empty_cache()  # 在块之间清理内存
```

### 跟踪丢失

**与背景合成** 对象跟踪在视频中途失败

**光照未改变**

* 当跟踪漂移时添加校正点
* 对初始分割使用框提示以获得更好效果
* 选择更清晰的初始帧

```python

# 添加校正点
predictor.add_new_points_or_box(
    inference_state=inference_state,
    frame_idx=lost_frame,
    obj_id=obj_id,
    points=np.array([[new_x, new_y]], dtype=np.float32),
    labels=np.array([1], dtype=np.int32)
)
```

### 处理缓慢

**与背景合成** 视频处理太慢

**光照未改变**

* 使用更小的模型变体（tiny/small）
* 降低视频分辨率
* 启用半精度（fp16）
* 在 A100 GPU 上处理

```python

# 使用更小的 SAM2.1 模型以提高速度
predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_t.yaml",
    "./checkpoints/sam2.1_hiera_tiny.pt",
    device="cuda"
)
```

### 掩码质量差

**与背景合成** 分割边缘粗糙

**光照未改变**

* 使用更大的模型（用 large 替代 tiny）
* 添加更多点提示
* 结合点提示和框提示

## # 使用固定种子以获得一致结果

### 分割不准确

* 更精确地点击目标对象
* 添加多个正/负点
* 对大对象使用框提示

### 视频内存错误

* 一次处理更少帧
* 降低视频分辨率
* 对长视频使用流式模式

### 跟踪丢失

* 当对象变化时添加更多提示
* 使用记忆库功能
* 检查对象是否被遮挡

### 处理缓慢

* SAM2 计算量大
* 对长视频使用 A100
* 考虑跳帧处理

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## 使用以下方式支付

* [GroundingDINO](/guides/guides_v2-zh/shi-jue-mo-xing/groundingdino.md) - 自动检测要分割的对象
* [Florence-2](/guides/guides_v2-zh/shi-jue-mo-xing/florence2.md) - 视觉-语言理解
* [Depth Anything（深度任意分割）](/guides/guides_v2-zh/tu-xiang-chu-li/depth-anything.md) - 深度估计


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/shi-jue-mo-xing/sam2-video.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.