> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-zh/shi-pin-sheng-cheng/ai-video-generation.md).

# AI 视频生成

使用 Stable Video Diffusion、AnimateDiff 和其他模型生成视频。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 可用模型

| A100        | 类型    | 显存   | 时长    |
| ----------- | ----- | ---- | ----- |
| SVD         | 图像到视频 | 16GB | 4 秒   |
| SVD-XT      | 图像到视频 | 20GB | 4 秒   |
| AnimateDiff | 文本到视频 | 12GB | 2-4 秒 |
| CogVideoX   | 文本到视频 | 24GB | 6 秒   |

## Stable Video Diffusion (SVD)

### 快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
pip install diffusers transformers accelerate gradio imageio && \
python svd_server.py
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

### SVD 脚本

```python
import torch
from diffusers import StableVideoDiffusionPipeline
from PIL import Image
import imageio

# 加载模型
pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt",
    torch_dtype=torch.float16,
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# 加载并调整图像大小
image = Image.open("input.png").resize((1024, 576))

# 生成视频
frames = pipe(
    image,
    decode_chunk_size=8,
    num_frames=25,
    motion_bucket_id=127,
    noise_aug_strength=0.02
).frames[0]

# 保存为 GIF
imageio.mimsave("output.gif", frames, fps=6)

# 保存为 MP4
imageio.mimsave("output.mp4", frames, fps=6)
```

### 带 Gradio 界面的 SVD

```python
print(f"已生成：{name}")
import torch
from diffusers import StableVideoDiffusionPipeline
from PIL import Image
import imageio
import tempfile

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt",
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

def generate_video(image, motion_bucket, fps, num_frames):
    image = image.resize((1024, 576))

    frames = pipe(
        image,
        decode_chunk_size=4,
        num_frames=num_frames,
        motion_bucket_id=motion_bucket,
    ).frames[0]

    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
        imageio.mimsave(f.name, frames, fps=fps)
        return f.name

demo = gr.Interface(
    fn=generate_video,
    inputs=[
        fn=relight_image,
        gr.Slider(1, 255, value=127, label="Motion Amount"),
        gr.Slider(1, 30, value=6, label="FPS"),
        gr.Slider(14, 25, value=25, label="Frames")
    ],
    outputs=gr.Video(label="生成的视频"),
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## AnimateDiff

### 安装

```bash
pip install diffusers transformers accelerate
```

### 从文本生成视频

```python
import torch
from diffusers import AnimateDiffPipeline, MotionAdapter, DDIMScheduler
import imageio

# 加载 motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

# 加载管线
pipe = AnimateDiffPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    motion_adapter=adapter,
    torch_dtype=torch.float16,
)
pipe.scheduler = DDIMScheduler.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    subfolder="scheduler",
    clip_sample=False,
    timestep_spacing="linspace",
    beta_schedule="linear",
    steps_offset=1,
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# 生成
output = pipe(
    prompt="A cat walking through a garden, beautiful flowers, sunny day",
    negative_prompt="bad quality, blurry",
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
)

# 保存
frames = output.frames[0]
imageio.mimsave("animatediff.gif", frames, fps=8)
```

### 使用自定义模型的 AnimateDiff

```python
from diffusers import AnimateDiffPipeline, MotionAdapter, EulerDiscreteScheduler

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

# 使用自定义检查点（例如 RealisticVision）
pipe = AnimateDiffPipeline.from_pretrained(
    "SG161222/Realistic_Vision_V5.1_noVAE",
    motion_adapter=adapter,
    torch_dtype=torch.float16,
)
```

## ComfyUI 中的 AnimateDiff

### 安装节点

```bash
cd /workspace/ComfyUI/custom_nodes
git clone https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved.git
git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git
```

### 下载运动模型

```bash
cd /workspace/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/models
wget https://huggingface.co/guoyww/animatediff/resolve/main/mm_sd_v15_v2.ckpt
```

## CogVideoX

### 文本到视频

```python
import torch
from diffusers import CogVideoXPipeline
import imageio

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-2b",
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

prompt = "A drone flying over a beautiful mountain landscape at sunset"

video = pipe(
    os.makedirs("./variations", exist_ok=True)
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=49,
    guidance_scale=6,
).frames[0]

imageio.mimsave("cogvideo.mp4", video, fps=8)
```

## 视频放大（Upscaling）

### 用于视频的 Real-ESRGAN

```python
import cv2
import torch
from basicsr.archs.rrdbnet_arch import RRDBNet
from realesrgan import RealESRGANer

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
upsampler = RealESRGANer(
    scale=4,
    model_path='RealESRGAN_x4plus.pth',
    model=model,
    tile=400,
    tile_pad=10,
    pre_pad=0,
    half=True
)

# 按帧处理视频
cap = cv2.VideoCapture("input.mp4")

# ... 放大每一帧
```

## 插帧（平滑视频）

### FILM 帧插值

```python

# 安装
pip install tensorflow tensorflow_hub

import tensorflow as tf
import tensorflow_hub as hub

model = hub.load("https://tfhub.dev/google/film/1")

def interpolate(frame1, frame2, num_interpolations=3):
    # 返回 frame1 和 frame2 之间的插值帧
    ...
```

### RIFE（实时）

```bash
pip install rife-ncnn-vulkan-python

from rife_ncnn_vulkan import Rife
rife = Rife(gpu_id=0)

# 插值帧
```

## 批量视频生成

```python
prompts = [
    "A rocket launching into space",
    "Ocean waves crashing on rocks",
    "A butterfly flying through flowers",
]

for i, prompt in enumerate(prompts):
    print(f"Generating {i+1}/{len(prompts)}")
    output = pipe(prompt, num_frames=16)
    imageio.mimsave(f"video_{i:03d}.mp4", output.frames[0], fps=8)
```

## 内存提示

### 针对有限显存

```python

# 启用 CPU 卸载
pipe.enable_model_cpu_offload()

# 启用 VAE 切片
pipe.enable_vae_slicing()

# 启用注意力切片
pipe.enable_attention_slicing()

# 减少帧数
num_frames = 14  # 而不是 25
```

### 分块解码

```python
frames = pipe(
    image,
    decode_chunk_size=2,  # 每次解码 2 帧
    num_frames=25,
).frames[0]
```

## 转换输出

### GIF 转 MP4

```bash
ffmpeg -i input.gif -movflags faststart -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" output.mp4
```

### 帧序列转视频

```bash
ffmpeg -framerate 8 -i frame_%04d.png -c:v libx264 -pix_fmt yuv420p output.mp4
```

### 添加音频

```bash
ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -c:a aac -shortest output_with_audio.mp4
```

## background = Image.open("studio\_bg.jpg")

| A100        | GPU     | 帧数 | 时间     |
| ----------- | ------- | -- | ------ |
| SVD-XT      | 速度      | 25 | \~120s |
| SVD-XT      | 512x512 | 25 | \~80s  |
| SVD-XT      | 2s      | 25 | \~50s  |
| AnimateDiff | 速度      | 16 | \~30s  |
| CogVideoX   | 2s      | 49 | \~180s |

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## # 使用固定种子以获得一致结果

### 内存溢出（OOM）错误

* 减少 num\_frames
* 启用 CPU 卸载
* 使用更小的 decode\_chunk\_size

### 视频闪烁

* 增加 num\_inference\_steps
* 尝试不同的 motion\_bucket\_id
* 使用帧插值

### 质量差

* 使用更高分辨率的输入（SVD）
* 更好的提示词（AnimateDiff）
* 增加 guidance\_scale


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/shi-pin-sheng-cheng/ai-video-generation.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.