# SAM2 Video

Track and segment any object through video with Meta's SAM2.1 — the improved version of SAM2 with enhanced video accuracy.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

{% hint style="info" %}
All examples in this guide can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace) marketplace.
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is SAM2?

SAM2 (Segment Anything Model 2) by Meta AI enables:

* Real-time video object segmentation
* Click-to-track any object
* Consistent tracking through occlusions
* Memory-efficient video processing

## What's New in SAM2.1

SAM2.1 brings significant improvements over the original SAM2:

* **Improved video accuracy** — Better tracking through occlusions and fast motion
* **Enhanced memory module** — More consistent long-range tracking
* **New checkpoints** — `sam2.1_hiera_*` series with better performance
* **Official pip package** — Install with `pip install sam-2` (no manual build required)
* **Faster inference** — Optimized CUDA kernels

## Resources

* **GitHub:** [facebookresearch/sam2](https://github.com/facebookresearch/sam2)
* **Paper:** [SAM2 Paper](https://arxiv.org/abs/2408.00714)
* **Demo:** [SAM2 Demo](https://sam2.metademolab.com/)
* **Model Weights:** [SAM2.1 Checkpoints](https://github.com/facebookresearch/sam2#model-checkpoints)

## Recommended Hardware

| Component | Minimum       | Recommended   | Optimal       |
| --------- | ------------- | ------------- | ------------- |
| GPU       | RTX 3060 12GB | RTX 4080 16GB | RTX 4090 24GB |
| VRAM      | 8GB           | 16GB          | 24GB          |
| CPU       | 4 cores       | 8 cores       | 16 cores      |
| RAM       | 16GB          | 32GB          | 64GB          |
| Storage   | 30GB SSD      | 50GB NVMe     | 100GB NVMe    |
| Internet  | 100 Mbps      | 500 Mbps      | 1 Gbps        |

## Quick Deploy on CLORE.AI

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
cd /workspace && \
pip install sam-2 && \
python -c "from sam2.build_sam import build_sam2; print('SAM2.1 ready!')"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
# Official pip package (recommended for SAM2.1)
pip install sam-2

# Download SAM2.1 checkpoints
python -c "
from sam2.utils.misc import download_file_with_progress

checkpoints = [
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt', 'checkpoints/sam2.1_hiera_tiny.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt', 'checkpoints/sam2.1_hiera_small.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt', 'checkpoints/sam2.1_hiera_base_plus.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt', 'checkpoints/sam2.1_hiera_large.pt'),
]
"

# Or use the download script
mkdir -p checkpoints && cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt
```

### Alternative: From Source (for development)

```bash
git clone https://github.com/facebookresearch/sam2.git
cd sam2
pip install -e ".[demo]"

# Download SAM2.1 checkpoints
cd checkpoints
bash download_ckpts.sh
```

## What You Can Create

### Video Editing

* Remove objects from videos
* Replace backgrounds seamlessly
* Create video masks for compositing

### Sports Analysis

* Track players through games
* Analyze movement patterns
* Generate highlight reels

### Medical Imaging

* Segment organs in CT/MRI videos
* Track cell movement in microscopy
* Measure growth over time

### Surveillance & Security

* Track objects across cameras
* Count people/vehicles
* Anomaly detection

### Creative Projects

* Rotoscoping for VFX
* Interactive video installations
* AR/VR content creation

## Basic Usage

### Image Segmentation

```python
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from PIL import Image
import numpy as np

# Load SAM2.1 model (improved accuracy over SAM2)
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"

sam2 = build_sam2(model_cfg, checkpoint, device="cuda")
predictor = SAM2ImagePredictor(sam2)

# Load image
image = np.array(Image.open("image.jpg"))
predictor.set_image(image)

# Segment with point prompt
point_coords = np.array([[500, 375]])  # x, y coordinates
point_labels = np.array([1])  # 1 = foreground

masks, scores, logits = predictor.predict(
    point_coords=point_coords,
    point_labels=point_labels,
    multimask_output=True
)

# Get best mask
best_mask = masks[scores.argmax()]
```

### Video Object Tracking

```python
import torch
from sam2.build_sam import build_sam2_video_predictor
import numpy as np

# Initialize SAM2.1 video predictor (improved tracking accuracy)
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"

predictor = build_sam2_video_predictor(model_cfg, checkpoint, device="cuda")

# Initialize with video
video_path = "./video_frames"  # Directory with frame images
inference_state = predictor.init_state(video_path=video_path)

# Add point on first frame
predictor.reset_state(inference_state)
frame_idx = 0
obj_id = 1  # Object ID for tracking

points = np.array([[400, 300]], dtype=np.float32)
labels = np.array([1], dtype=np.int32)

# Add object to track
_, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
    inference_state=inference_state,
    frame_idx=frame_idx,
    obj_id=obj_id,
    points=points,
    labels=labels
)

# Propagate through video
video_segments = {}
for out_frame_idx, out_obj_ids, out_mask_logits in predictor.propagate_in_video(inference_state):
    video_segments[out_frame_idx] = {
        obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
        for i, obj_id in enumerate(out_obj_ids)
    }
```

## Multi-Object Tracking

```python
import torch
from sam2.build_sam import build_sam2_video_predictor
import numpy as np

predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)

video_path = "./video_frames"
inference_state = predictor.init_state(video_path=video_path)

# Track multiple objects
objects_to_track = [
    {"id": 1, "point": [200, 150], "frame": 0},  # Person 1
    {"id": 2, "point": [400, 200], "frame": 0},  # Person 2
    {"id": 3, "point": [600, 300], "frame": 0},  # Ball
]

for obj in objects_to_track:
    predictor.add_new_points_or_box(
        inference_state=inference_state,
        frame_idx=obj["frame"],
        obj_id=obj["id"],
        points=np.array([obj["point"]], dtype=np.float32),
        labels=np.array([1], dtype=np.int32)
    )

# Propagate all objects
all_masks = {}
for frame_idx, obj_ids, mask_logits in predictor.propagate_in_video(inference_state):
    all_masks[frame_idx] = {}
    for i, obj_id in enumerate(obj_ids):
        all_masks[frame_idx][obj_id] = (mask_logits[i] > 0.0).cpu().numpy()
```

## Box Prompt Segmentation

```python
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
import numpy as np
from PIL import Image

sam2 = build_sam2(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)
predictor = SAM2ImagePredictor(sam2)

image = np.array(Image.open("image.jpg"))
predictor.set_image(image)

# Segment with bounding box
box = np.array([100, 100, 400, 400])  # x1, y1, x2, y2

masks, scores, _ = predictor.predict(
    box=box,
    multimask_output=False
)
```

## Gradio Interface

```python
import gradio as gr
import numpy as np
from PIL import Image
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor

sam2 = build_sam2(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)
predictor = SAM2ImagePredictor(sam2)

def segment_image(image, x, y):
    predictor.set_image(np.array(image))

    masks, scores, _ = predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[scores.argmax()]

    # Create overlay
    overlay = np.array(image).copy()
    overlay[best_mask] = overlay[best_mask] * 0.5 + np.array([255, 0, 0]) * 0.5

    return Image.fromarray(overlay.astype(np.uint8))

demo = gr.Interface(
    fn=segment_image,
    inputs=[
        gr.Image(type="pil", label="Input Image"),
        gr.Number(label="X coordinate"),
        gr.Number(label="Y coordinate")
    ],
    outputs=gr.Image(label="Segmented Image"),
    title="SAM2 - Segment Anything",
    description="Click coordinates to segment objects. Running on CLORE.AI GPU servers."
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## Export Masks as Video

```python
import cv2
import numpy as np
from sam2.build_sam import build_sam2_video_predictor

predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)

# ... (tracking code from above)

# Export to video
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output_masks.mp4', fourcc, 30.0, (width, height))

for frame_idx in sorted(video_segments.keys()):
    frame = cv2.imread(f"./video_frames/{frame_idx:05d}.jpg")

    # Apply mask overlay
    for obj_id, mask in video_segments[frame_idx].items():
        color = [0, 255, 0] if obj_id == 1 else [0, 0, 255]
        frame[mask.squeeze()] = frame[mask.squeeze()] * 0.5 + np.array(color) * 0.5

    out.write(frame.astype(np.uint8))

out.release()
```

## Performance

| Task               | Resolution | GPU      | Speed |
| ------------------ | ---------- | -------- | ----- |
| Image segmentation | 1024x1024  | RTX 3090 | 50ms  |
| Image segmentation | 1024x1024  | RTX 4090 | 30ms  |
| Video (per frame)  | 720p       | RTX 4090 | 45ms  |
| Video (per frame)  | 1080p      | A100     | 35ms  |

## Model Variants (SAM2.1)

SAM2.1 introduces new `sam2.1_hiera_*` checkpoints with improved video tracking accuracy:

| Model                     | Parameters | VRAM     | Speed      | Quality  | Checkpoint                   |
| ------------------------- | ---------- | -------- | ---------- | -------- | ---------------------------- |
| sam2.1\_hiera\_tiny       | 38M        | 4GB      | Fastest    | Good     | sam2.1\_hiera\_tiny.pt       |
| sam2.1\_hiera\_small      | 46M        | 5GB      | Fast       | Better   | sam2.1\_hiera\_small.pt      |
| sam2.1\_hiera\_base\_plus | 80M        | 8GB      | Medium     | Great    | sam2.1\_hiera\_base\_plus.pt |
| **sam2.1\_hiera\_large**  | **224M**   | **12GB** | **Slower** | **Best** | **sam2.1\_hiera\_large.pt**  |

> **Note:** SAM2.1 models consistently outperform their SAM2 counterparts on video benchmarks, especially for fast-moving objects and long occlusions.

## Common Problems & Solutions

### Out of Memory

**Problem:** CUDA out of memory on long videos

**Solutions:**

```python

# Process in chunks
chunk_size = 100  # frames per chunk

for start_frame in range(0, total_frames, chunk_size):
    end_frame = min(start_frame + chunk_size, total_frames)
    # Process chunk...
    torch.cuda.empty_cache()  # Clear memory between chunks
```

### Tracking Lost

**Problem:** Object tracking fails mid-video

**Solutions:**

* Add correction points when tracking drifts
* Use box prompts for better initial segmentation
* Choose clearer initial frames

```python

# Add correction point
predictor.add_new_points_or_box(
    inference_state=inference_state,
    frame_idx=lost_frame,
    obj_id=obj_id,
    points=np.array([[new_x, new_y]], dtype=np.float32),
    labels=np.array([1], dtype=np.int32)
)
```

### Slow Processing

**Problem:** Video processing is too slow

**Solutions:**

* Use smaller model variant (tiny/small)
* Reduce video resolution
* Enable half-precision (fp16)
* Process on A100 GPU

```python

# Use smaller SAM2.1 model for speed
predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_t.yaml",
    "./checkpoints/sam2.1_hiera_tiny.pt",
    device="cuda"
)
```

### Poor Mask Quality

**Problem:** Segmentation edges are rough

**Solutions:**

* Use larger model (large instead of tiny)
* Add more point prompts
* Combine point and box prompts

## Troubleshooting

### Segmentation inaccurate

* Click more precisely on target object
* Add multiple positive/negative points
* Use box prompt for large objects

### Video memory error

* Process fewer frames at once
* Reduce video resolution
* Use streaming mode for long videos

### Tracking lost

* Add more prompts when object changes
* Use memory bank feature
* Check object isn't occluded

### Slow processing

* SAM2 is compute-heavy
* Use A100 for long videos
* Consider frame skipping

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* [GroundingDINO](/guides/vision-models/groundingdino.md) - Auto-detect objects to segment
* [Florence-2](/guides/vision-models/florence2.md) - Vision-language understanding
* [Depth Anything](/guides/image-processing/depth-anything.md) - Depth estimation


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/vision-models/sam2-video.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.