# SAM2 Video

Track and segment any object through video with Meta's SAM2.1 — the improved version of SAM2 with enhanced video accuracy.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

{% hint style="info" %}
All examples in this guide can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace) marketplace.
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is SAM2?

SAM2 (Segment Anything Model 2) by Meta AI enables:

* Real-time video object segmentation
* Click-to-track any object
* Consistent tracking through occlusions
* Memory-efficient video processing

## What's New in SAM2.1

SAM2.1 brings significant improvements over the original SAM2:

* **Improved video accuracy** — Better tracking through occlusions and fast motion
* **Enhanced memory module** — More consistent long-range tracking
* **New checkpoints** — `sam2.1_hiera_*` series with better performance
* **Official pip package** — Install with `pip install sam-2` (no manual build required)
* **Faster inference** — Optimized CUDA kernels

## Resources

* **GitHub:** [facebookresearch/sam2](https://github.com/facebookresearch/sam2)
* **Paper:** [SAM2 Paper](https://arxiv.org/abs/2408.00714)
* **Demo:** [SAM2 Demo](https://sam2.metademolab.com/)
* **Model Weights:** [SAM2.1 Checkpoints](https://github.com/facebookresearch/sam2#model-checkpoints)

## Recommended Hardware

| Component | Minimum       | Recommended   | Optimal       |
| --------- | ------------- | ------------- | ------------- |
| GPU       | RTX 3060 12GB | RTX 4080 16GB | RTX 4090 24GB |
| VRAM      | 8GB           | 16GB          | 24GB          |
| CPU       | 4 cores       | 8 cores       | 16 cores      |
| RAM       | 16GB          | 32GB          | 64GB          |
| Storage   | 30GB SSD      | 50GB NVMe     | 100GB NVMe    |
| Internet  | 100 Mbps      | 500 Mbps      | 1 Gbps        |

## Quick Deploy on CLORE.AI

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
cd /workspace && \
pip install sam-2 && \
python -c "from sam2.build_sam import build_sam2; print('SAM2.1 ready!')"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
# Official pip package (recommended for SAM2.1)
pip install sam-2

# Download SAM2.1 checkpoints
python -c "
from sam2.utils.misc import download_file_with_progress

checkpoints = [
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt', 'checkpoints/sam2.1_hiera_tiny.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt', 'checkpoints/sam2.1_hiera_small.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt', 'checkpoints/sam2.1_hiera_base_plus.pt'),
    ('https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt', 'checkpoints/sam2.1_hiera_large.pt'),
]
"

# Or use the download script
mkdir -p checkpoints && cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt
```

### Alternative: From Source (for development)

```bash
git clone https://github.com/facebookresearch/sam2.git
cd sam2
pip install -e ".[demo]"

# Download SAM2.1 checkpoints
cd checkpoints
bash download_ckpts.sh
```

## What You Can Create

### Video Editing

* Remove objects from videos
* Replace backgrounds seamlessly
* Create video masks for compositing

### Sports Analysis

* Track players through games
* Analyze movement patterns
* Generate highlight reels

### Medical Imaging

* Segment organs in CT/MRI videos
* Track cell movement in microscopy
* Measure growth over time

### Surveillance & Security

* Track objects across cameras
* Count people/vehicles
* Anomaly detection

### Creative Projects

* Rotoscoping for VFX
* Interactive video installations
* AR/VR content creation

## Basic Usage

### Image Segmentation

```python
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from PIL import Image
import numpy as np

# Load SAM2.1 model (improved accuracy over SAM2)
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"

sam2 = build_sam2(model_cfg, checkpoint, device="cuda")
predictor = SAM2ImagePredictor(sam2)

# Load image
image = np.array(Image.open("image.jpg"))
predictor.set_image(image)

# Segment with point prompt
point_coords = np.array([[500, 375]])  # x, y coordinates
point_labels = np.array([1])  # 1 = foreground

masks, scores, logits = predictor.predict(
    point_coords=point_coords,
    point_labels=point_labels,
    multimask_output=True
)

# Get best mask
best_mask = masks[scores.argmax()]
```

### Video Object Tracking

```python
import torch
from sam2.build_sam import build_sam2_video_predictor
import numpy as np

# Initialize SAM2.1 video predictor (improved tracking accuracy)
checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"

predictor = build_sam2_video_predictor(model_cfg, checkpoint, device="cuda")

# Initialize with video
video_path = "./video_frames"  # Directory with frame images
inference_state = predictor.init_state(video_path=video_path)

# Add point on first frame
predictor.reset_state(inference_state)
frame_idx = 0
obj_id = 1  # Object ID for tracking

points = np.array([[400, 300]], dtype=np.float32)
labels = np.array([1], dtype=np.int32)

# Add object to track
_, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
    inference_state=inference_state,
    frame_idx=frame_idx,
    obj_id=obj_id,
    points=points,
    labels=labels
)

# Propagate through video
video_segments = {}
for out_frame_idx, out_obj_ids, out_mask_logits in predictor.propagate_in_video(inference_state):
    video_segments[out_frame_idx] = {
        obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
        for i, obj_id in enumerate(out_obj_ids)
    }
```

## Multi-Object Tracking

```python
import torch
from sam2.build_sam import build_sam2_video_predictor
import numpy as np

predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)

video_path = "./video_frames"
inference_state = predictor.init_state(video_path=video_path)

# Track multiple objects
objects_to_track = [
    {"id": 1, "point": [200, 150], "frame": 0},  # Person 1
    {"id": 2, "point": [400, 200], "frame": 0},  # Person 2
    {"id": 3, "point": [600, 300], "frame": 0},  # Ball
]

for obj in objects_to_track:
    predictor.add_new_points_or_box(
        inference_state=inference_state,
        frame_idx=obj["frame"],
        obj_id=obj["id"],
        points=np.array([obj["point"]], dtype=np.float32),
        labels=np.array([1], dtype=np.int32)
    )

# Propagate all objects
all_masks = {}
for frame_idx, obj_ids, mask_logits in predictor.propagate_in_video(inference_state):
    all_masks[frame_idx] = {}
    for i, obj_id in enumerate(obj_ids):
        all_masks[frame_idx][obj_id] = (mask_logits[i] > 0.0).cpu().numpy()
```

## Box Prompt Segmentation

```python
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
import numpy as np
from PIL import Image

sam2 = build_sam2(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)
predictor = SAM2ImagePredictor(sam2)

image = np.array(Image.open("image.jpg"))
predictor.set_image(image)

# Segment with bounding box
box = np.array([100, 100, 400, 400])  # x1, y1, x2, y2

masks, scores, _ = predictor.predict(
    box=box,
    multimask_output=False
)
```

## Gradio Interface

```python
import gradio as gr
import numpy as np
from PIL import Image
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor

sam2 = build_sam2(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)
predictor = SAM2ImagePredictor(sam2)

def segment_image(image, x, y):
    predictor.set_image(np.array(image))

    masks, scores, _ = predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[scores.argmax()]

    # Create overlay
    overlay = np.array(image).copy()
    overlay[best_mask] = overlay[best_mask] * 0.5 + np.array([255, 0, 0]) * 0.5

    return Image.fromarray(overlay.astype(np.uint8))

demo = gr.Interface(
    fn=segment_image,
    inputs=[
        gr.Image(type="pil", label="Input Image"),
        gr.Number(label="X coordinate"),
        gr.Number(label="Y coordinate")
    ],
    outputs=gr.Image(label="Segmented Image"),
    title="SAM2 - Segment Anything",
    description="Click coordinates to segment objects. Running on CLORE.AI GPU servers."
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## Export Masks as Video

```python
import cv2
import numpy as np
from sam2.build_sam import build_sam2_video_predictor

predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_l.yaml",
    "./checkpoints/sam2.1_hiera_large.pt",
    device="cuda"
)

# ... (tracking code from above)

# Export to video
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output_masks.mp4', fourcc, 30.0, (width, height))

for frame_idx in sorted(video_segments.keys()):
    frame = cv2.imread(f"./video_frames/{frame_idx:05d}.jpg")

    # Apply mask overlay
    for obj_id, mask in video_segments[frame_idx].items():
        color = [0, 255, 0] if obj_id == 1 else [0, 0, 255]
        frame[mask.squeeze()] = frame[mask.squeeze()] * 0.5 + np.array(color) * 0.5

    out.write(frame.astype(np.uint8))

out.release()
```

## Performance

| Task               | Resolution | GPU      | Speed |
| ------------------ | ---------- | -------- | ----- |
| Image segmentation | 1024x1024  | RTX 3090 | 50ms  |
| Image segmentation | 1024x1024  | RTX 4090 | 30ms  |
| Video (per frame)  | 720p       | RTX 4090 | 45ms  |
| Video (per frame)  | 1080p      | A100     | 35ms  |

## Model Variants (SAM2.1)

SAM2.1 introduces new `sam2.1_hiera_*` checkpoints with improved video tracking accuracy:

| Model                     | Parameters | VRAM     | Speed      | Quality  | Checkpoint                   |
| ------------------------- | ---------- | -------- | ---------- | -------- | ---------------------------- |
| sam2.1\_hiera\_tiny       | 38M        | 4GB      | Fastest    | Good     | sam2.1\_hiera\_tiny.pt       |
| sam2.1\_hiera\_small      | 46M        | 5GB      | Fast       | Better   | sam2.1\_hiera\_small.pt      |
| sam2.1\_hiera\_base\_plus | 80M        | 8GB      | Medium     | Great    | sam2.1\_hiera\_base\_plus.pt |
| **sam2.1\_hiera\_large**  | **224M**   | **12GB** | **Slower** | **Best** | **sam2.1\_hiera\_large.pt**  |

> **Note:** SAM2.1 models consistently outperform their SAM2 counterparts on video benchmarks, especially for fast-moving objects and long occlusions.

## Common Problems & Solutions

### Out of Memory

**Problem:** CUDA out of memory on long videos

**Solutions:**

```python

# Process in chunks
chunk_size = 100  # frames per chunk

for start_frame in range(0, total_frames, chunk_size):
    end_frame = min(start_frame + chunk_size, total_frames)
    # Process chunk...
    torch.cuda.empty_cache()  # Clear memory between chunks
```

### Tracking Lost

**Problem:** Object tracking fails mid-video

**Solutions:**

* Add correction points when tracking drifts
* Use box prompts for better initial segmentation
* Choose clearer initial frames

```python

# Add correction point
predictor.add_new_points_or_box(
    inference_state=inference_state,
    frame_idx=lost_frame,
    obj_id=obj_id,
    points=np.array([[new_x, new_y]], dtype=np.float32),
    labels=np.array([1], dtype=np.int32)
)
```

### Slow Processing

**Problem:** Video processing is too slow

**Solutions:**

* Use smaller model variant (tiny/small)
* Reduce video resolution
* Enable half-precision (fp16)
* Process on A100 GPU

```python

# Use smaller SAM2.1 model for speed
predictor = build_sam2_video_predictor(
    "configs/sam2.1/sam2.1_hiera_t.yaml",
    "./checkpoints/sam2.1_hiera_tiny.pt",
    device="cuda"
)
```

### Poor Mask Quality

**Problem:** Segmentation edges are rough

**Solutions:**

* Use larger model (large instead of tiny)
* Add more point prompts
* Combine point and box prompts

## Troubleshooting

### Segmentation inaccurate

* Click more precisely on target object
* Add multiple positive/negative points
* Use box prompt for large objects

### Video memory error

* Process fewer frames at once
* Reduce video resolution
* Use streaming mode for long videos

### Tracking lost

* Add more prompts when object changes
* Use memory bank feature
* Check object isn't occluded

### Slow processing

* SAM2 is compute-heavy
* Use A100 for long videos
* Consider frame skipping

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* [GroundingDINO](https://docs.clore.ai/guides/vision-models/groundingdino) - Auto-detect objects to segment
* [Florence-2](https://docs.clore.ai/guides/vision-models/florence2) - Vision-language understanding
* [Depth Anything](https://docs.clore.ai/guides/image-processing/depth-anything) - Depth estimation
