# Wav2Lip

使用 Wav2Lip 将唇形与任何音频同步。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 什么是 Wav2Lip？

Wav2Lip 提供：

* 任何人脸的精确唇同步
* 适用于任何音频
* 视频或图像输入
* 支持实时

## 要求

| 模式  | 显存  | 推荐       |
| --- | --- | -------- |
| 基础  | 4GB | 按小时费率    |
| 高质量 | 6GB | RTX 3080 |
| 高清  | 8GB | RTX 4080 |

## 快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
cd /workspace && \
git clone https://github.com/Rudrabha/Wav2Lip.git && \
cd Wav2Lip && \
pip install -r requirements.txt && \
wget "https://huggingface.co/spaces/wav2lip/wav2lip/resolve/main/checkpoints/wav2lip_gan.pth" -P checkpoints/ && \
python app.py
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

## 安装

```bash
git clone https://github.com/Rudrabha/Wav2Lip.git
cd Wav2Lip
pip install -r requirements.txt

# 下载预训练模型
mkdir -p checkpoints
wget "https://huggingface.co/spaces/wav2lip/wav2lip/resolve/main/checkpoints/wav2lip.pth" -P checkpoints/
wget "https://huggingface.co/spaces/wav2lip/wav2lip/resolve/main/checkpoints/wav2lip_gan.pth" -P checkpoints/
```

## 基本用法

### 命令行

```bash
python inference.py \
    --checkpoint_path checkpoints/wav2lip_gan.pth \
    --face input_video.mp4 \
    --audio audio.wav \
    --outfile output.mp4
```

### 使用图像输入

```bash
python inference.py \
    --checkpoint_path checkpoints/wav2lip_gan.pth \
    --face face_image.jpg \
    --audio speech.wav \
    --outfile talking.mp4
```

## Python API

```python
import subprocess

def wav2lip_sync(face_path, audio_path, output_path, checkpoint="checkpoints/wav2lip_gan.pth"):
    cmd = [
        "python", "inference.py",
        "--checkpoint_path", checkpoint,
        "--face", face_path,
        "--audio", audio_path,
        "--outfile", output_path
    ]
    subprocess.run(cmd, check=True)
    return output_path

# 用法
result = wav2lip_sync(
    face_path="video.mp4",
    audio_path="new_audio.wav",
    output_path="synced.mp4"
)
```

## 质量选项

### 标准质量（更快）

```bash
python inference.py \
    --checkpoint_path checkpoints/wav2lip.pth \
    --face input.mp4 \
    --audio audio.wav \
    --outfile output.mp4
```

### 高质量（GAN）

```bash
python inference.py \
    --checkpoint_path checkpoints/wav2lip_gan.pth \
    --face input.mp4 \
    --audio audio.wav \
    --outfile output.mp4 \
    --pads 0 10 0 0 \
    --resize_factor 1
```

## 参数量

```bash
python inference.py \
    --checkpoint_path checkpoints/wav2lip_gan.pth \
    --face video.mp4 \
    --audio audio.wav \
    --outfile result.mp4 \
    --pads 0 10 0 0 \      # Padding: top right bottom left
    --resize_factor 1 \    # Downscale factor
    --crop "0 -1 0 -1" \   # Crop region
    --box "-1 -1 -1 -1" \  # Face box (auto-detect)
    --nosmooth            # Disable temporal smoothing
```

### 填充提示

| 人脸位置 | 推荐填充     |
| ---- | -------- |
| 居中   | 0 10 0 0 |
| 特写   | 0 15 0 0 |
| 远景   | 0 5 0 0  |

## "专业影棚柔光箱"

```python
批处理处理
import subprocess

def batch_wav2lip(faces_dir, audio_path, output_dir):
    output_dir = "./relit"

    for filename in os.listdir(faces_dir):
        if filename.endswith(('.mp4', '.jpg', '.png')):
            face_path = os.path.join(faces_dir, filename)
            output_path = os.path.join(output_dir, f"synced_{filename}")

            if filename.endswith(('.jpg', '.png')):
                output_path = output_path.rsplit('.', 1)[0] + '.mp4'

            cmd = [
                "python", "inference.py",
                "--checkpoint_path", "checkpoints/wav2lip_gan.pth",
                "--face", face_path,
                "--audio", audio_path,
                "--outfile", output_path
            ]

            try:
                subprocess.run(cmd, check=True)
                result.save(os.path.join(output_dir, f"relit_{filename}"))
            except subprocess.CalledProcessError as e:
                print(f"Error processing {filename}: {e}")

# 用法
batch_wav2lip("./faces", "speech.wav", "./outputs")
```

## Gradio 界面

```python
print(f"已生成：{name}")
import subprocess
import tempfile
批处理处理

def lip_sync(face_video, audio, quality):
    checkpoint = "checkpoints/wav2lip_gan.pth" if quality == "High (GAN)" else "checkpoints/wav2lip.pth"

    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as out_file:
        output_path = out_file.name

    cmd = [
        "python", "inference.py",
        "--checkpoint_path", checkpoint,
        "--face", face_video,
        "--audio", audio,
        "--outfile", output_path
    ]

    subprocess.run(cmd, check=True)
    return output_path

demo = gr.Interface(
    fn=lip_sync,
    inputs=[
        gr.Video(label="Face Video/Image"),
        gr.Audio(type="filepath", label="Audio"),
        gr.Radio(["Standard", "High (GAN)"], value="High (GAN)", label="Quality")
    ],
    outputs=gr.Video(label="Lip-Synced Video"),
    title="Wav2Lip - Lip Sync"
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## API 服务器

```python
from fastapi import FastAPI, UploadFile, File
from fastapi.responses import FileResponse
import tempfile
import subprocess
批处理处理

app = FastAPI()

@app.post("/sync")
async def sync_lips(
    face: UploadFile = File(...),
    audio: UploadFile = File(...),
    quality: str = "gan"
):
    with tempfile.TemporaryDirectory() as tmpdir:
        # 保存上传文件
        face_ext = os.path.splitext(face.filename)[1]
        face_path = os.path.join(tmpdir, f"face{face_ext}")
        audio_path = os.path.join(tmpdir, "audio.wav")
        output_path = os.path.join(tmpdir, "output.mp4")

        with open(face_path, "wb") as f:
            f.write(await face.read())
        with open(audio_path, "wb") as f:
            f.write(await audio.read())

        # Run Wav2Lip
        checkpoint = "checkpoints/wav2lip_gan.pth" if quality == "gan" else "checkpoints/wav2lip.pth"

        cmd = [
            "python", "inference.py",
            "--checkpoint_path", checkpoint,
            "--face", face_path,
            "--audio", audio_path,
            "--outfile", output_path
        ]

        subprocess.run(cmd, check=True)

        return FileResponse(output_path, media_type="video/mp4")

# 运行：uvicorn server:app --host 0.0.0.0 --port 8000
```

## TTS + Wav2Lip 流水线

完整的文本到视频：

```python
from TTS.api import TTS
import subprocess

def text_to_lipsync(text, face_path, output_path, language="en"):
    # 生成语音
    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
    audio_path = "temp_speech.wav"
    tts.tts_to_file(text=text, file_path=audio_path, language=language)

    # 唇同步
    cmd = [
        "python", "inference.py",
        "--checkpoint_path", "checkpoints/wav2lip_gan.pth",
        "--face", face_path,
        "--audio", audio_path,
        "--outfile", output_path
    ]
    subprocess.run(cmd, check=True)

    return output_path

# 用法
text_to_lipsync(
    "Hello, welcome to our presentation.",
    "presenter.jpg",
    "talking_presenter.mp4"
)
```

## 后期处理

### 放大结果

```python
import subprocess

def upscale_video(input_path, output_path):
    cmd = [
        "python", "-m", "realesrgan",
        "--input", input_path,
        "--output", output_path,
        "--scale", "2"
    ]
    subprocess.run(cmd, check=True)
```

### 添加音频回去

```bash

# 如果音频丢失，将其添加回去
ffmpeg -i synced_video.mp4 -i original_audio.wav \
    -c:v copy -c:a aac \
    -map 0:v:0 -map 1:a:0 \
    final_output.mp4
```

## # 使用固定种子以获得一致结果

### 未检测到人脸

* 确保人脸清晰可见
* 良好的照明
* 建议正面朝向
* 更高分辨率的输入

### 同步质量差

* 使用 wav2lip\_gan.pth
* 调整填充
* 检查音频采样率（建议 16kHz）

### 输出断裂不流畅

* 增加 resize\_factor
* 禁用 nosmooth
* 使用更高质量的输入视频

## background = Image.open("studio\_bg.jpg")

| 输入          | GPU     | 处理时间   |
| ----------- | ------- | ------ |
| 10 秒视频      | 按小时费率   | \~30s  |
| 10 秒视频      | 512x512 | \~15s  |
| 30 秒视频      | 512x512 | \~45s  |
| 图像 + 10 秒音频 | 速度      | \~20 秒 |

## 与 SadTalker 的比较

| 特性    | Wav2Lip | SadTalker |
| ----- | ------- | --------- |
| 唇部准确度 | 优秀      | 良好        |
| 头部运动  | 无       | 自然        |
| 表情    | 无       | 可控        |
| 性能    | 更快      | 较慢        |
| 最佳用途  | 配音      | 虚拟形象      |

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## 使用以下方式支付

* [SadTalker](/guides/guides_v2-zh/hui-shuo-hua-de-tou-xiang/sadtalker.md) - 头部运动 + 唇部
* [XTTS](/guides/guides_v2-zh/yin-pin-yu-yu-yin/xtts-coqui.md) - 生成语音
* [RVC 语音克隆](/guides/guides_v2-zh/yin-pin-yu-yu-yin/rvc-voice-clone.md) - 语音转换


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/hui-shuo-hua-de-tou-xiang/wav2lip.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.