# AI 视频生成

使用 Stable Video Diffusion、AnimateDiff 和其他模型生成视频。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 可用模型

| A100        | 类型    | 显存   | 时长    |
| ----------- | ----- | ---- | ----- |
| SVD         | 图像到视频 | 16GB | 4 秒   |
| SVD-XT      | 图像到视频 | 20GB | 4 秒   |
| AnimateDiff | 文本到视频 | 12GB | 2-4 秒 |
| CogVideoX   | 文本到视频 | 24GB | 6 秒   |

## Stable Video Diffusion (SVD)

### 快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
pip install diffusers transformers accelerate gradio imageio && \
python svd_server.py
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

### SVD 脚本

```python
import torch
from diffusers import StableVideoDiffusionPipeline
from PIL import Image
import imageio

# 加载模型
pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt",
    torch_dtype=torch.float16,
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# 加载并调整图像大小
image = Image.open("input.png").resize((1024, 576))

# 生成视频
frames = pipe(
    image,
    decode_chunk_size=8,
    num_frames=25,
    motion_bucket_id=127,
    noise_aug_strength=0.02
).frames[0]

# 保存为 GIF
imageio.mimsave("output.gif", frames, fps=6)

# 保存为 MP4
imageio.mimsave("output.mp4", frames, fps=6)
```

### 带 Gradio 界面的 SVD

```python
print(f"已生成：{name}")
import torch
from diffusers import StableVideoDiffusionPipeline
from PIL import Image
import imageio
import tempfile

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt",
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

def generate_video(image, motion_bucket, fps, num_frames):
    image = image.resize((1024, 576))

    frames = pipe(
        image,
        decode_chunk_size=4,
        num_frames=num_frames,
        motion_bucket_id=motion_bucket,
    ).frames[0]

    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
        imageio.mimsave(f.name, frames, fps=fps)
        return f.name

demo = gr.Interface(
    fn=generate_video,
    inputs=[
        fn=relight_image,
        gr.Slider(1, 255, value=127, label="Motion Amount"),
        gr.Slider(1, 30, value=6, label="FPS"),
        gr.Slider(14, 25, value=25, label="Frames")
    ],
    outputs=gr.Video(label="生成的视频"),
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## AnimateDiff

### 安装

```bash
pip install diffusers transformers accelerate
```

### 从文本生成视频

```python
import torch
from diffusers import AnimateDiffPipeline, MotionAdapter, DDIMScheduler
import imageio

# 加载 motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

# 加载管线
pipe = AnimateDiffPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    motion_adapter=adapter,
    torch_dtype=torch.float16,
)
pipe.scheduler = DDIMScheduler.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    subfolder="scheduler",
    clip_sample=False,
    timestep_spacing="linspace",
    beta_schedule="linear",
    steps_offset=1,
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# 生成
output = pipe(
    prompt="A cat walking through a garden, beautiful flowers, sunny day",
    negative_prompt="bad quality, blurry",
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
)

# 保存
frames = output.frames[0]
imageio.mimsave("animatediff.gif", frames, fps=8)
```

### 使用自定义模型的 AnimateDiff

```python
from diffusers import AnimateDiffPipeline, MotionAdapter, EulerDiscreteScheduler

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

# 使用自定义检查点（例如 RealisticVision）
pipe = AnimateDiffPipeline.from_pretrained(
    "SG161222/Realistic_Vision_V5.1_noVAE",
    motion_adapter=adapter,
    torch_dtype=torch.float16,
)
```

## ComfyUI 中的 AnimateDiff

### 安装节点

```bash
cd /workspace/ComfyUI/custom_nodes
git clone https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved.git
git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git
```

### 下载运动模型

```bash
cd /workspace/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/models
wget https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt
```

## CogVideoX

### 文本到视频

```python
import torch
from diffusers import CogVideoXPipeline
import imageio

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-2b",
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

prompt = "A drone flying over a beautiful mountain landscape at sunset"

video = pipe(
    os.makedirs("./variations", exist_ok=True)
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=49,
    guidance_scale=6,
).frames[0]

imageio.mimsave("cogvideo.mp4", video, fps=8)
```

## 视频放大（Upscaling）

### 用于视频的 Real-ESRGAN

```python
import cv2
import torch
from basicsr.archs.rrdbnet_arch import RRDBNet
from realesrgan import RealESRGANer

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
upsampler = RealESRGANer(
    scale=4,
    model_path='RealESRGAN_x4plus.pth',
    model=model,
    tile=400,
    tile_pad=10,
    pre_pad=0,
    half=True
)

# 按帧处理视频
cap = cv2.VideoCapture("input.mp4")

# ... 放大每一帧
```

## 插帧（平滑视频）

### FILM 帧插值

```python

# 安装
pip install tensorflow tensorflow_hub

import tensorflow as tf
import tensorflow_hub as hub

model = hub.load("https://tfhub.dev/google/film/1")

def interpolate(frame1, frame2, num_interpolations=3):
    # 返回 frame1 和 frame2 之间的插值帧
    ...
```

### RIFE（实时）

```bash
pip install rife-ncnn-vulkan-python

from rife_ncnn_vulkan import Rife
rife = Rife(gpu_id=0)

# 插值帧
```

## 批量视频生成

```python
prompts = [
    "A rocket launching into space",
    "Ocean waves crashing on rocks",
    "A butterfly flying through flowers",
]

for i, prompt in enumerate(prompts):
    print(f"Generating {i+1}/{len(prompts)}")
    output = pipe(prompt, num_frames=16)
    imageio.mimsave(f"video_{i:03d}.mp4", output.frames[0], fps=8)
```

## 内存提示

### 针对有限显存

```python

# 启用 CPU 卸载
pipe.enable_model_cpu_offload()

# 启用 VAE 切片
pipe.enable_vae_slicing()

# 启用注意力切片
pipe.enable_attention_slicing()

# 减少帧数
num_frames = 14  # 而不是 25
```

### 分块解码

```python
frames = pipe(
    image,
    decode_chunk_size=2,  # 每次解码 2 帧
    num_frames=25,
).frames[0]
```

## 转换输出

### GIF 转 MP4

```bash
ffmpeg -i input.gif -movflags faststart -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" output.mp4
```

### 帧序列转视频

```bash
ffmpeg -framerate 8 -i frame_%04d.png -c:v libx264 -pix_fmt yuv420p output.mp4
```

### 添加音频

```bash
ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -c:a aac -shortest output_with_audio.mp4
```

## background = Image.open("studio\_bg.jpg")

| A100        | GPU     | 帧数 | 时间     |
| ----------- | ------- | -- | ------ |
| SVD-XT      | 速度      | 25 | \~120s |
| SVD-XT      | 512x512 | 25 | \~80s  |
| SVD-XT      | 2s      | 25 | \~50s  |
| AnimateDiff | 速度      | 16 | \~30s  |
| CogVideoX   | 2s      | 49 | \~180s |

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## # 使用固定种子以获得一致结果

### 内存溢出（OOM）错误

* 减少 num\_frames
* 启用 CPU 卸载
* 使用更小的 decode\_chunk\_size

### 视频闪烁

* 增加 num\_inference\_steps
* 尝试不同的 motion\_bucket\_id
* 使用帧插值

### 质量差

* 使用更高分辨率的输入（SVD）
* 更好的提示词（AnimateDiff）
* 增加 guidance\_scale