# Wan2.1 视频

在 CLORE.AI GPU 上使用阿里巴巴的 Wan2.1 文本到视频和图像到视频模型生成高质量视频。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 为什么选择 Wan2.1？

* **高质量** - 最先进的视频生成
* **多种模式** - 文本到视频、图像到视频
* **多种规模** - 参数量从 1.3B 到 14B
* **长视频** - 最多 81 帧
* **开放权重** - Apache 2.0 许可

## 1024x1024

| A100            | 参数量  | 显存   | 分辨率  | 帧数 |
| --------------- | ---- | ---- | ---- | -- |
| Wan2.1-T2V-1.3B | 1.3B | 8GB  | 480p | 81 |
| Wan2.1-T2V-14B  | 14B  | 24GB | 720p | 81 |
| Wan2.1-I2V-14B  | 14B  | 24GB | 720p | 81 |

## 在 CLORE.AI 上快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
pip install diffusers transformers accelerate gradio && \
python -c "
print(f"已生成：{name}")
import torch
from diffusers import WanPipeline
from diffusers.utils import export_to_video

pipe = WanPipeline.from_pretrained('alibaba-pai/Wan2.1-T2V-1.3B', torch_dtype=torch.float16)
pipe.to('cuda')
pipe.enable_model_cpu_offload()

def generate(prompt, steps, frames, seed):
    generator = torch.Generator('cuda').manual_seed(seed) if seed > 0 else None
    output = pipe(prompt, num_frames=frames, num_inference_steps=steps, generator=generator)
    export_to_video(output.frames[0], 'output.mp4', fps=16)
    return 'output.mp4'

gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label='Prompt'),
        gr.Slider(20, 100, value=50, label='Steps'),
        gr.Slider(16, 81, value=49, step=8, label='Frames'),
        gr.Number(value=-1, label='Seed')
    ],
    outputs=gr.Video(),
    title='Wan2.1 - 文本到视频'
).launch(server_name='0.0.0.0', server_port=7860)
"
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

## 硬件要求

| A100     | 最低 GPU        | 推荐            | 最佳      |
| -------- | ------------- | ------------- | ------- |
| 1.3B T2V | RTX 3070 8GB  | RTX 3090 24GB | 512x512 |
| 14B T2V  | RTX 4090 24GB | 按日费率          | 4 小时会话  |
| 14B I2V  | RTX 4090 24GB | 按日费率          | 4 小时会话  |

## 安装

```bash
pip install diffusers transformers accelerate torch
```

## 文本到视频

### 基本用法（1.3B）

```python
import torch
from diffusers import WanPipeline
from diffusers.utils import export_to_video

pipe = WanPipeline.from_pretrained(
    "alibaba-pai/Wan2.1-T2V-1.3B",
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

prompt = "一只猫在阳光明媚的花园里玩球"

output = pipe(
    os.makedirs("./variations", exist_ok=True)
    num_frames=49,
    num_inference_steps=50,
    guidance_scale=7.0
)

export_to_video(output.frames[0], "cat_video.mp4", fps=16)
```

### 高质量（14B）

```python
import torch
from diffusers import WanPipeline
from diffusers.utils import export_to_video

pipe = WanPipeline.from_pretrained(
    "alibaba-pai/Wan2.1-T2V-14B",
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()

prompt = "电影感镜头：夕阳下一条龙飞越群山，4K，细节丰富"

output = pipe(
    os.makedirs("./variations", exist_ok=True)
    negative_prompt="模糊、低质量、变形",
    num_frames=81,
    height=720,
    width=1280,
    num_inference_steps=50,
    guidance_scale=7.0
)

export_to_video(output.frames[0], "dragon.mp4", fps=24)
```

## 图像到视频

### 为图像赋动画

```python
import torch
from diffusers import WanI2VPipeline
from diffusers.utils import load_image, export_to_video

pipe = WanI2VPipeline.from_pretrained(
    "alibaba-pai/Wan2.1-I2V-14B",
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# 加载输入图像
image = load_image("input.jpg")

prompt = "图像中的人物开始向前走"

output = pipe(
    os.makedirs("./variations", exist_ok=True)
    image=image,
    num_frames=49,
    num_inference_steps=50,
    guidance_scale=7.0
)

export_to_video(output.frames[0], "animated.mp4", fps=16)
```

## 使用 Wan2.1-I2V-14B 的图像到视频

{% hint style="info" %}
Wan2.1-I2V-14B 使用文本提示引导运动来为静态图像添加动画。需要 **24GB 显存** （推荐 RTX 4090 或 A100 40GB）。
{% endhint %}

### 模型详情

| 属性    | 数值                            |
| ----- | ----------------------------- |
| 模型 ID | `Wan-AI/Wan2.1-I2V-14B-480P`  |
| 参数量   | 140 亿                         |
| 所需显存  | **24GB**                      |
| 最大分辨率 | 480p（854×480）或 720p（1280×720） |
| 最大帧数  | 81                            |
| 许可    | Apache 2.0                    |

### 硬件要求

| GPU      | 显存   | 状态     |
| -------- | ---- | ------ |
| 512x512  | 24GB | ✅ 推荐   |
| 速度       | 24GB | ✅ 支持   |
| 按日费率     | 40GB | ✅ 最佳   |
| 4 小时会话   | 80GB | ✅ 最佳质量 |
| RTX 3080 | 10GB | ❌ 不足   |

### 快速命令行脚本

保存为 `generate_i2v.py` 并运行：

```bash
python generate_i2v.py --model Wan-AI/Wan2.1-I2V-14B-480P --image input.jpg --prompt "camera slowly zooms out"
```

### generate\_i2v.py — 完整脚本

```python
#!/usr/bin/env python3
"""
Wan2.1 图像到视频 CLI 脚本。
用法：python generate_i2v.py --model Wan-AI/Wan2.1-I2V-14B-480P \
           --image input.jpg --prompt "camera slowly zooms out"
"""

import argparse
批处理处理
"""通过 Ollama 使用 LLaVA 分析图像"""
import torch
from diffusers import WanImageToVideoPipeline
from diffusers.utils import load_image, export_to_video
from PIL import Image


def parse_args():
    parser = argparse.ArgumentParser(description="Wan2.1 图像到视频生成器")
    parser.add_argument(
        "--model",
        type=str,
        default="Wan-AI/Wan2.1-I2V-14B-480P",
        help="来自 Hugging Face 的模型 ID（默认：Wan-AI/Wan2.1-I2V-14B-480P）",
    )
    parser.add_argument(
        "--image",
        type=str,
        required=True,
        help="输入图像路径（JPEG 或 PNG）",
    )
    parser.add_argument(
        "--prompt",
        type=str,
        required=True,
        help='描述期望运动的文本提示（例如 "camera slowly zooms out"）',
    )
    parser.add_argument(
        "--negative-prompt",
        type=str,
        default="模糊、低质量、变形、抖动的运动、伪影",
        help="用于避免不想要伪影的负面提示",
    )
    parser.add_argument(
        "--frames",
        type=int,
        default=49,
        help="要生成的视频帧数（默认：49，最大：81）",
    )
    parser.add_argument(
        "--steps",
        type=int,
        default=50,
        help="扩散步骤数（默认：50）",
    )
    parser.add_argument(
        "--guidance",
        type=float,
        default=7.0,
        help="无分类器引导尺度（默认：7.0）",
    )
    parser.add_argument(
        "--seed",
        type=int,
        default=-1,
        help="用于可重复性的随机种子（-1 = 随机）",
    )
    parser.add_argument(
        "--fps",
        type=int,
        default=16,
        help="输出视频帧率（默认：16）",
    )
    parser.add_argument(
        "--output",
        type=str,
        default="output_i2v.mp4",
        help="输出视频文件路径（默认：output_i2v.mp4）",
    )
    parser.add_argument(
        "--height",
        type=int,
        default=480,
        help="输出视频高度（像素）（默认：480）",
    )
    parser.add_argument(
        "--width",
        type=int,
        default=854,
        help="输出视频宽度（像素）（默认：854）",
    )
    parser.add_argument(
        "--cpu-offload",
        action="store_true",
        default=True,
        help="启用模型 CPU 卸载以节省显存（默认：True）",
    )
    parser.add_argument(
        "--vae-tiling",
        action="store_true",
        default=False,
        help="启用 VAE 平铺以生成高分辨率输出",
    )
    return parser.parse_args()


def load_and_resize_image(image_path: str, width: int, height: int) -> Image.Image:
    """从路径加载图像并调整为目标尺寸。"""
    if not os.path.exists(image_path):
        print(f"[ERROR] 未找到图像：{image_path}", file=sys.stderr)
        sys.exit(1)

    img = Image.open(image_path).convert("RGB")
    original_size = img.size
    img = img.resize((width, height), Image.LANCZOS)
    print(f"[INFO] 已加载图像：{image_path} ({original_size[0]}x{original_size[1]}) → 调整为 {width}x{height}")
    return img


def load_pipeline(model_id: str, cpu_offload: bool, vae_tiling: bool):
    """使用内存优化加载 Wan I2V 管道。"""
    print(f"[INFO] 正在加载模型：{model_id}")
    print(f"[INFO] CUDA 可用：{torch.cuda.is_available()}")
    if torch.cuda.is_available():
        vram_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
        print(f"[INFO] GPU：{torch.cuda.get_device_name(0)} ({vram_gb:.1f} GB 显存)")
        if vram_gb < 23:
            print("[WARN] 检测到显存少于 24GB — 启用 --cpu-offload 或使用 1.3B 模型")

    pipe = WanImageToVideoPipeline.from_pretrained(
        model_id,
        torch_dtype=torch.float16,
    )

    if cpu_offload:
        print("[INFO] 启用模型 CPU 卸载")
        pipe.enable_model_cpu_offload()
    else:
        pipe.to("cuda")

    if vae_tiling:
        print("[INFO] 为高分辨率生成启用 VAE 平铺")
        pipe.enable_vae_tiling()

    return pipe


def generate_video(pipe, args) -> None:
    """运行 I2V 管道并保存输出视频。"""
    image = load_and_resize_image(args.image, args.width, args.height)

    generator = None
    if args.seed >= 0:
        generator = torch.Generator("cuda").manual_seed(args.seed)
        print(f"[INFO] 使用种子：{args.seed}")
    else:
        print("[INFO] 使用随机种子")

    print(f"[INFO] 正在生成 {args.frames} 帧，分辨率为 {args.width}x{args.height}")
    print(f"[INFO] 步数：{args.steps} | 引导：{args.guidance} | FPS：{args.fps}")
    print(f"[INFO] 提示词：{args.prompt}")

    output = pipe(
        prompt=args.prompt,
        negative_prompt=args.negative_prompt,
        image=image,
        num_frames=args.frames,
        height=args.height,
        width=args.width,
        num_inference_steps=args.steps,
        guidance_scale=args.guidance,
        generator=generator,
    )

    export_to_video(output.frames[0], args.output, fps=args.fps)
    print(f"[INFO] 视频已保存到：{os.path.abspath(args.output)}")
    duration = args.frames / args.fps
    print(f"[INFO] 时长：{duration:.1f}s，{args.fps}fps（{args.frames} 帧）")


def main():
    args = parse_args()

    if not torch.cuda.is_available():
        print("[ERROR] 未找到 CUDA GPU。Wan2.1-I2V-14B 需要支持 CUDA 的 GPU。", file=sys.stderr)
        sys.exit(1)

    pipe = load_pipeline(args.model, args.cpu_offload, args.vae_tiling)
    generate_video(pipe, args)
    print("[DONE] 图像到视频生成完成！")


if __name__ == "__main__":
    main()
```

### 高级 I2V 管道（Python API）

```python
import torch
from diffusers import WanImageToVideoPipeline
from diffusers.utils import load_image, export_to_video
from PIL import Image

# ── 加载管道 ──────────────────────────────────────────────────────────────
pipe = WanImageToVideoPipeline.from_pretrained(
    "Wan-AI/Wan2.1-I2V-14B-480P",
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()   # 将显存保持在 24GB 以下
pipe.enable_vae_tiling()          # 可选：有助于 720p

# ── 加载并准备输入图像 ─────────────────────────────────────────────────
image = load_image("input.jpg").resize((854, 480))

# ── 生成 ───────────────────────────────────────────────────────────────────
prompt = "相机缓慢拉远，展示完整的景观"
negative_prompt = "模糊、低质量、变形、闪烁、伪影"

generator = torch.Generator("cuda").manual_seed(42)

output = pipe(
    os.makedirs("./variations", exist_ok=True)
    negative_prompt=negative_prompt,
    image=image,
    num_frames=49,          # 约 3 秒（16fps）
    height=480,
    width=854,
    num_inference_steps=50,
    guidance_scale=7.5,
    generator=generator,
)

export_to_video(output.frames[0], "i2v_output.mp4", fps=16)
print("已保存：i2v_output.mp4")
```

### I2V 提示词技巧

| 目标   | 提示示例                |
| ---- | ------------------- |
| 相机运动 | `"相机从主体缓慢拉远"`       |
| 视差效果 | `"细微的视差运动，景深变化"`    |
| 角色动画 | `"人物转头并微笑"`         |
| 自然动画 | `"树叶在微风中沙沙作响，光线变化"` |
| 抽象运动 | `"颜色旋转并融合，流动的运动"`   |

### I2V 的显存提示（24GB GPU）

```python
# 在 24GB GPU 上必需
pipe.enable_model_cpu_offload()

# 可选：减少峰值显存约 10%
pipe.enable_vae_tiling()
pipe.enable_vae_slicing()

# 在运行间清理
import gc
gc.collect()
torch.cuda.empty_cache()
```

## 提示示例

### 自然与风景

```python
prompts = [
    "云层在山峰上移动的延时摄影，戏剧性光线",
    "海浪拍打岩石，慢动作，电影感",
    "北极光在夜空中舞动，色彩鲜艳",
    "秋天的森林，落叶飘零，宁静的氛围"
]
```

### 动物与角色

```python
prompts = [
    "一只金毛在花田中奔跑",
    "一只蝴蝶破茧而出，微距镜头",
    "武士拔剑，戏剧性光照",
    "机器人在未来城市街道上行走"
]
```

### 抽象与艺术

```python
prompts = [
    "五彩颜料在水中旋转，抽象艺术",
    "几何形状变换和形态转换，霓虹色",
    "墨滴在牛奶中扩散，微距摄影"
]
```

## 高级设置

### 质量与速度

```python
# 快速预览
output = pipe(
    os.makedirs("./variations", exist_ok=True)
    num_frames=17,
    num_inference_steps=25,
    guidance_scale=5.0
)

# 平衡
output = pipe(
    os.makedirs("./variations", exist_ok=True)
    num_frames=49,
    num_inference_steps=50,
    guidance_scale=7.0
)

# 最高质量
output = pipe(
    os.makedirs("./variations", exist_ok=True)
    num_frames=81,
    num_inference_steps=100,
    guidance_scale=7.5
)
```

### 分辨率选项

```python
# 480p（1.3B 模型）
output = pipe(prompt, height=480, width=854, num_frames=49)

# 720p（14B 模型）
output = pipe(prompt, height=720, width=1280, num_frames=49)

# 1080p（14B 模型，高显存）
output = pipe(prompt, height=1080, width=1920, num_frames=33)
```

## 批量生成

```python
批处理处理
import torch
from diffusers import WanPipeline
from diffusers.utils import export_to_video

pipe = WanPipeline.from_pretrained("alibaba-pai/Wan2.1-T2V-1.3B", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

prompts = [
    "一枚火箭发射升空",
    "鱼在珊瑚礁中游动",
    "雨在夜晚的城市街道上落下"
]

output_dir = "./videos"
output_dir = "./relit"

for i, prompt in enumerate(prompts):
    print(f"正在生成 {i+1}/{len(prompts)}：{prompt[:40]}...")

    output = pipe(
        os.makedirs("./variations", exist_ok=True)
        num_frames=49,
        num_inference_steps=50
    )

    export_to_video(output.frames[0], f"{output_dir}/video_{i:03d}.mp4", fps=16)
    torch.cuda.empty_cache()
```

## Gradio 界面

```python
print(f"已生成：{name}")
import torch
from diffusers import WanPipeline
from diffusers.utils import export_to_video
import tempfile

pipe = WanPipeline.from_pretrained("alibaba-pai/Wan2.1-T2V-1.3B", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

def generate_video(prompt, negative_prompt, frames, steps, guidance, seed):
    import gradio as gr

    output = pipe(
        os.makedirs("./variations", exist_ok=True)
        negative_prompt=negative_prompt,
        num_frames=frames,
        def relight_image(image, prompt, steps, seed):
        guidance_scale=guidance,
        generator = torch.Generator("cuda").manual_seed(seed) if seed > 0 else None
    )

    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
        export_to_video(output.frames[0], f.name, fps=16)
        return f.name

demo = gr.Interface(
    fn=generate_video,
    inputs=[
        gr.Textbox(label="Prompt", lines=2),
        gr.Textbox(label="Negative Prompt", value="模糊、低质量"),
        gr.Slider(17, 81, value=49, step=8, label="Frames"),
        gr.Slider(20, 100, value=50, step=5, label="Steps"),
        gr.Slider(3, 12, value=7, step=0.5, label="Guidance"),
        gr.Number(value=-1, label="随机种子")
    ],
    outputs=gr.Video(label="生成的视频"),
    title="Wan2.1 - 文本到视频生成",
    description="从文本提示生成视频。在 CLORE.AI 上运行。"
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## 内存优化

```python
# 启用所有优化
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
pipe.enable_vae_slicing()

# 对于非常低显存
pipe.enable_sequential_cpu_offload()

# 在生成之间清理缓存
torch.cuda.empty_cache()
```

## background = Image.open("studio\_bg.jpg")

| A100 | 分辨率  | 帧数 | GPU     | 时间       |
| ---- | ---- | -- | ------- | -------- |
| 1.3B | 480p | 49 | 512x512 | \~2 分钟   |
| 1.3B | 480p | 49 | 按日费率    | \~1.5 分钟 |
| 14B  | 720p | 49 | 按日费率    | 约 5 分钟   |
| 14B  | 720p | 81 | 4 小时会话  | \~8 分钟   |

## 下载所有所需的检查点

典型 CLORE.AI 市场价格：

| GPU           | 验证 CUDA 兼容性 | \~49 帧视频/小时            |
| ------------- | ----------- | ---------------------- |
| RTX 3090 24GB | \~$0.06     | \~20（1.3B）             |
| RTX 4090 24GB | \~$0.10     | \~30（1.3B）             |
| 按日费率          | \~$0.17     | \~40（1.3B） / \~12（14B） |
| 4 小时会话        | \~$0.25     | \~8（14B 高分辨率）          |

*价格各异。查看* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

## # 使用固定种子以获得一致结果

### 内存不足

```python
# 使用更小的模型
pipe = WanPipeline.from_pretrained("alibaba-pai/Wan2.1-T2V-1.3B")

# 启用所有优化
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()

# 减少帧数
output = pipe(prompt, num_frames=17)

# 降低分辨率
output = pipe(prompt, height=480, width=854)
```

### 质量差

* 增加步数（75-100）
* 编写更详细的提示词
* 使用负面提示
* 尝试 14B 模型以获得更好质量

### 视频太短

* 增加 `num_frames` （最大 81）
* 使用 RIFE 插值进行帧插值
* 串联多次生成

### 伪影/闪烁

* 增加引导尺度
* 使用固定种子以保持一致性
* 使用视频稳定处理作为后期处理

## Wan2.1 与其他模型比较

| 特性     | Wan2.1     | 混元（Hunyuan） | SVD  | CogVideoX |
| ------ | ---------- | ----------- | ---- | --------- |
| 质量     | 优秀         | 优秀          | 良好   | 很棒        |
| 性能     | 快速         | 中等          | 快速   | 慢         |
| 最大帧数   | 81         | 129         | 25   | 49        |
| 分辨率    | 720p       | 720p        | 576p | 720p      |
| 支持 I2V | 是          | 是           | 是    | 是         |
| 许可     | Apache 2.0 | 打开          | 打开   | 打开        |

**何时使用 Wan2.1：**

* 需要开源的视频生成
* 希望快速生成速度
* 需要 Apache 2.0 许可
* 需要平衡的质量/速度

## 使用以下方式支付

* [混元视频（Hunyuan Video）](https://docs.clore.ai/guides/guides_v2-zh/shi-pin-sheng-cheng/hunyuan-video) - 替代的 T2V
* [OpenSora](https://docs.clore.ai/guides/guides_v2-zh/shi-pin-sheng-cheng/opensora) - Open Sora 替代方案
* [Stable Video Diffusion](https://docs.clore.ai/guides/guides_v2-zh/shi-pin-sheng-cheng/stable-video-diffusion) - 图像动画
* [RIFE 插值](https://docs.clore.ai/guides/guides_v2-zh/shi-pin-chu-li/rife-interpolation) - 帧插值