# Stable Video Diffusion

{% hint style="info" %}
**Newer alternatives available!** Consider [**FramePack**](https://docs.clore.ai/guides/video-generation/framepack) (only 6GB VRAM!), [**Wan2.1**](https://docs.clore.ai/guides/video-generation/wan-video) (higher quality), or [**LTX-2**](https://docs.clore.ai/guides/video-generation/ltx-video-2) (video with native audio).
{% endhint %}

Generate videos from images using Stability AI's SVD model.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## What is Stable Video Diffusion?

SVD (Stable Video Diffusion) generates short video clips from a single image:

* 14 or 25 frame outputs
* 576x1024 resolution
* Smooth motion generation
* Open source weights

## Resources

* **HuggingFace:** [stabilityai/stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)
* **GitHub:** [Stability-AI/generative-models](https://github.com/Stability-AI/generative-models)
* **Paper:** [SVD Paper](https://arxiv.org/abs/2311.15127)

## Hardware Requirements

| Model              | VRAM | Recommended GPU |
| ------------------ | ---- | --------------- |
| SVD (14 frames)    | 16GB | RTX 4090        |
| SVD-XT (25 frames) | 24GB | RTX 4090 / A100 |

## Quick Deploy

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install diffusers transformers accelerate && \
pip install gradio && \
python -c "
import gradio as gr
from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import export_to_video
import torch

pipe = StableVideoDiffusionPipeline.from_pretrained(
    'stabilityai/stable-video-diffusion-img2vid-xt',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')

def generate(image, seed, fps):
    generator = torch.manual_seed(seed)
    frames = pipe(image, num_frames=25, generator=generator).frames[0]
    export_to_video(frames, 'output.mp4', fps=fps)
    return 'output.mp4'

gr.Interface(
    fn=generate,
    inputs=[gr.Image(type='pil'), gr.Number(value=42, label='Seed'), gr.Slider(6, 30, value=7, label='FPS')],
    outputs=gr.Video(),
    title='Stable Video Diffusion'
).launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
pip install diffusers transformers accelerate torch

# For video export
pip install imageio[ffmpeg]
```

## Basic Usage

```python
import torch
from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

# Load pipeline
pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")

# Load and resize image
image = load_image("input.jpg")
image = image.resize((1024, 576))

# Generate video
generator = torch.manual_seed(42)
frames = pipe(image, num_frames=25, generator=generator).frames[0]

# Save video
export_to_video(frames, "output.mp4", fps=7)
```

## SVD vs SVD-XT

| Feature  | SVD     | SVD-XT    |
| -------- | ------- | --------- |
| Frames   | 14      | 25        |
| Duration | \~2 sec | \~3.5 sec |
| VRAM     | 16GB    | 24GB      |
| Quality  | Good    | Better    |

## Memory Optimization

```python

# Enable memory efficient attention
pipe.enable_model_cpu_offload()

# Or use attention slicing
pipe.enable_attention_slicing()

# For very low VRAM
pipe.enable_sequential_cpu_offload()
```

## Batch Processing

```python
import os
from pathlib import Path

input_dir = Path("./images")
output_dir = Path("./videos")
output_dir.mkdir(exist_ok=True)

for img_path in input_dir.glob("*.jpg"):
    image = load_image(str(img_path)).resize((1024, 576))
    frames = pipe(image, num_frames=25).frames[0]
    export_to_video(frames, str(output_dir / f"{img_path.stem}.mp4"), fps=7)
    print(f"Generated: {img_path.stem}.mp4")
```

## ComfyUI Integration

SVD works great in ComfyUI:

1. Install ComfyUI
2. Download SVD model to `models/checkpoints/`
3. Use SVD nodes for img2vid workflow

## Troubleshooting

{% hint style="danger" %}
**Out of memory**
{% endhint %}

* Use `enable_model_cpu_offload()`
* Reduce `num_frames` to 14
* Use fp16 variant

### Video too short

* Use SVD-XT (25 frames) instead of SVD (14 frames)
* Interpolate with RIFE for smoother result

### Poor motion quality

* Use high-quality input images
* Ensure image is 1024x576 (or 576x1024)
* Try different seeds

### CUDA errors

* Update PyTorch and diffusers
* Check CUDA version compatibility

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

## Next Steps

* AnimateDiff - Animate SD images
* [RIFE Interpolation](https://docs.clore.ai/guides/video-processing/rife-interpolation) - Increase FPS
* [Hunyuan Video](https://docs.clore.ai/guides/video-generation/hunyuan-video) - Text-to-video
