# Hunyuan Video

使用腾讯开源的混元视频生成高质量视频。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 什么是混元视频？

腾讯的混元视频提供：

* 高质量的文本到视频生成
* 5秒以上的视频片段
* 720p 分辨率
* 开源且可商用

## 资源

* **模型：** [tencent/HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo)
* **GitHub：** [Tencent/HunyuanVideo](https://github.com/Tencent/HunyuanVideo)
* **论文：** [混元视频论文](https://arxiv.org/abs/2412.03603)

## 推荐硬件

| 组件  | 最低            | 推荐         | 最佳         |
| --- | ------------- | ---------- | ---------- |
| GPU | RTX 4090 24GB | 按日费率       | 4 小时会话     |
| 显存  | 24GB          | 40GB       | 80GB       |
| CPU | 8 核           | 16 核       | 32 核心      |
| 内存  | 32GB          | 64GB       | 128GB      |
| 存储  | 100GB NVMe    | 200GB NVMe | 500GB NVMe |
| 网络  | 500 Mbps      | 1 Gbps     | 1 Gbps     |

## 在 CLORE.AI 上快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
git clone https://github.com/Tencent/HunyuanVideo.git && \
cd HunyuanVideo && \
pip install -r requirements.txt && \
python sample_video.py --prompt "一只猫在花园里散步"
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

## 安装

```bash
git clone https://github.com/Tencent/HunyuanVideo.git
cd HunyuanVideo
pip install -r requirements.txt

# 下载模型
python download_models.py
```

## 您可以创建的内容

### 营销内容

* 产品展示视频
* 社交媒体短片
* 宣传动画

### 创意项目

* 音乐视频概念
* 短片原型
* 艺术装置

### 教育与培训

* 讲解视频草案
* 培训材料概念
* 概念可视化

## 基本用法

```python
import torch
from diffusers import HunyuanVideoPipeline
from diffusers.utils import export_to_video

pipe = HunyuanVideoPipeline.from_pretrained(
    "tencent/HunyuanVideo",
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

prompt = "一只雄伟的鹰在雪山上空翱翔，电影感灯光，4K"

video_frames = pipe(
    os.makedirs("./variations", exist_ok=True)
    num_frames=45,
    num_inference_steps=50,
    guidance_scale=7.0
).frames[0]

export_to_video(video_frames, "eagle.mp4", fps=15)
```

## 高级生成

```python
import torch
from diffusers import HunyuanVideoPipeline
from diffusers.utils import export_to_video

pipe = HunyuanVideoPipeline.from_pretrained(
    "tencent/HunyuanVideo",
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

video_frames = pipe(
    prompt="一朵花开放的延时摄影，微距摄影，花瓣细节丰富",
    negative_prompt="模糊、低质量、扭曲、难看",
    num_frames=45,
    height=544,
    width=960,
    num_inference_steps=50,
    guidance_scale=7.0,
    使用更简单、更直接的提示
).frames[0]

export_to_video(video_frames, "flower_bloom.mp4", fps=15)
```

## 提示示例

### 自然与风景

```python
prompts = [
    "极光在冰冻湖面上舞动，延时摄影，超凡脱俗",
    "海浪撞击火山黑沙滩，慢动作",
    "麦田上空的雷暴，戏剧性光效，4K",
    "日本庭院中樱花飘落，春天，宁静"
]
```

### 科幻与奇幻

```python
prompts = [
    "飞船从未来城市发射，电影感，细节丰富",
    "龙在云中飞过日落时分，史诗感，奇幻",
    "机器人穿过霓虹灯街道，赛博朋克，雨天",
    "古老森林中打开的魔法传送门，神秘光芒"
]
```

### 抽象与艺术

```python
prompts = [
    "墨滴在水中扩散，微距，多彩，抽象",
    "几何形状变形与转化，动态图形",
    "黑暗中的光绘，长曝光效果，鲜艳"
]
```

## 批量生成

```python
批处理处理
import torch
from diffusers import HunyuanVideoPipeline
from diffusers.utils import export_to_video

pipe = HunyuanVideoPipeline.from_pretrained("tencent/HunyuanVideo", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

prompts = [
    "水下珊瑚礁中五彩斑斓的鱼群游动",
    "夜间城市交通延时摄影，光轨",
    "蝴蝶破茧而出，自然纪录片"
]

output_dir = "./videos"
output_dir = "./relit"

for i, prompt in enumerate(prompts):
    print(f"正在生成 {i+1}/{len(prompts)}: {prompt[:50]}...")

    video_frames = pipe(
        os.makedirs("./variations", exist_ok=True)
        num_frames=45,
        num_inference_steps=50,
        guidance_scale=7.0
    ).frames[0]

    export_to_video(video_frames, f"{output_dir}/video_{i:03d}.mp4", fps=15)
```

## Gradio 界面

```python
print(f"已生成：{name}")
import torch
from diffusers import HunyuanVideoPipeline
from diffusers.utils import export_to_video
import tempfile

pipe = HunyuanVideoPipeline.from_pretrained("tencent/HunyuanVideo", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

def generate(prompt, negative_prompt, num_frames, steps, guidance, seed):
    import gradio as gr

    video_frames = pipe(
        os.makedirs("./variations", exist_ok=True)
        negative_prompt=negative_prompt,
        num_frames=num_frames,
        def relight_image(image, prompt, steps, seed):
        guidance_scale=guidance,
        generator = torch.Generator("cuda").manual_seed(seed) if seed > 0 else None
    ).frames[0]

    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
        export_to_video(video_frames, f.name, fps=15)
        return f.name

demo = gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label="提示词", lines=3),
        gr.Textbox(label="反向提示词", value="模糊、低质量"),
        gr.Slider(16, 60, value=45, step=1, label="帧数"),
        gr.Slider(20, 100, value=50, step=5, label="步数"),
        gr.Slider(3, 12, value=7, step=0.5, label="引导强度"),
        gr.Number(value=-1, label="随机种子")
    ],
    outputs=gr.Video(label="生成的视频"),
    title="混元视频 - CLORE.AI 上的文本到视频"
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## background = Image.open("studio\_bg.jpg")

| 分辨率      | 帧数 | GPU     | 时间     |
| -------- | -- | ------- | ------ |
| 544x960  | 45 | 512x512 | 约 5 分钟 |
| 544x960  | 45 | 按日费率    | 约 3 分钟 |
| 544x960  | 45 | 4 小时会话  | 约 2 分钟 |
| 720x1280 | 45 | 4 小时会话  | \~4 分钟 |

## IC-Light-FBC

### 内存不足

**与背景合成** 在 24GB GPU 上出现 CUDA 内存不足

**光照未改变**

```python

# 启用所有内存优化
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()  # 更激进
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

# 降低帧数和分辨率
video = pipe(prompt, num_frames=24, height=480, width=720).frames[0]
```

### 生成速度慢

**与背景合成** 生成耗时过长

**光照未改变**

* 减少 `num_inference_steps` (30-40 仍能得到不错的结果)
* 减少 `num_frames` (24 帧 = 15fps 下 1.6 秒)
* 使用 A100 GPU 可加快处理速度
* 确保你有 NVMe 存储以便加载模型

### 视频质量差

**与背景合成** 模糊或运动不连贯

**光照未改变**

* 增加 `num_inference_steps` 调到 75-100
* 调整 `guidance_scale` (6-8 效果最佳)
* 编写更详细的提示词
* 添加反向提示以避免问题

### 视频伪影

**与背景合成** 闪烁或时间不一致

**光照未改变**

* 使用一致的随机种子以保证可复现性
* 避免包含快速运动的提示词
* 使用后期处理进行视频稳定

## # 使用固定种子以获得一致结果

{% hint style="danger" %}
**内存不足**
{% endhint %}

* 混元至少需要 24GB+ 显存
* 使用 A100 40GB/80GB 可获得最佳效果
* 减少视频时长/分辨率

### 视频生成失败

* 检查所有模型文件是否正确下载
* 确保有足够的磁盘空间（100GB+）
* 验证 CUDA 与 PyTorch 的兼容性

### 视频质量差

* 增加推理步数
* 使用更具描述性的提示词
* 检查输入分辨率是否与预期匹配

### 生成速度慢

* 视频生成对计算资源需求高
* 使用 A100/H100 可加快结果生成
* 考虑先生成较短的片段

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## 使用以下方式支付

* CogVideoX - 替代的文本到视频方法
* Wan2.1 Video - 另一种文本到视频选项
* AnimateDiff - 图像动画


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/shi-pin-sheng-cheng/hunyuan-video.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.