# Dia TTS（Nari Labs）

Dia by Nari Labs 是一个先进的文本转语音模型，专注于 **逼真的多说话人对话**。与一次只处理一位说话人的传统 TTS 不同，Dia 可生成多个说话人之间带有情感、笑声、犹豫和其他非语言提示的自然对话。参数量为 16 亿，可在任何 8GB+ GPU 上运行。

## 主要特性

* **多说话人对话**：一次生成 2+ 说话人的对话
* **非语言提示**：笑声 `（笑）`，犹豫 `（叹气）`，停顿 — 自动嵌入
* **情感化语音**：无须显式情感标签也能呈现自然语调
* **16 亿参数**：适配 RTX 3070/3080（8-10GB 显存）
* **Apache 2.0 许可证**：完全商业使用
* **HuggingFace 集成**：可与 Transformers 库配合使用

## 要求

| 组件     | 最低            | 推荐             |
| ------ | ------------- | -------------- |
| GPU    | RTX 3070（8GB） | RTX 3080（10GB） |
| 显存     | 8GB           | 10GB+          |
| 内存     | 16GB          | 32GB           |
| 磁盘     | 10GB          | 15GB           |
| Python | 3.9+          | 3.11           |

**推荐的 Clore.ai GPU**：RTX 3080 10GB（约 $0.2–0.5/天）

## 安装

```bash
# 选项 1：pip 安装
pip install dia-tts

# 选项 2：从源码安装
git clone https://github.com/nari-labs/dia.git
cd dia
pip install -e .
```

## 快速开始

### 基本多说话人对话

```python
from dia import Dia

# 加载模型
model = Dia.from_pretrained("nari-labs/Dia-1.6B")

# 生成多说话人对话
# [S1] = 说话人 1， [S2] = 说话人 2
text = """[S1] 嘿，你试过新的 GPU 租赁平台吗？
[S2] 你是说 Clore？是的，我昨天租了台 RTX 4090。
[S1] 感觉如何？
[S2]（笑）说实话？比我预期的便宜多了。大约每天两块钱。
[S1] 不会吧。那……那真是太夸张了。"""

audio = model.generate(text)

# 保存到文件
import soundfile as sf
sf.write("dialog.wav", audio, samplerate=24000)
```

### 带情感与非语言提示

```python
# Dia 会自动处理自然的语音模式
text = """[S1] 我刚收到了结果……
[S2] 然后呢？别吊我胃口！
[S1]（叹气）我们通过了。我们居然通过了所有测试。
[S2]（笑）我早就说过！我就说我们能做到！
[S1] 我都不敢相信……（笑）好吧好吧，我们来庆祝吧。"""

audio = model.generate(text, temperature=0.8)
sf.write("emotional_dialog.wav", audio, samplerate=24000)
```

### 单一说话人

```python
# 对单一说话人也适用
text = "[S1] 欢迎阅读 Clore AI 文档。在本指南中，我们将演示如何设置您的第一个 GPU 租赁并部署机器学习模型。"

audio = model.generate(text)
sf.write("narration.wav", audio, samplerate=24000)
```

## Gradio Web 界面

```python
# 启动交互式演示
python -m dia.app --port 7860 --share

# 或手动：
print(f"已生成：{name}")
from dia import Dia

model = Dia.from_pretrained("nari-labs/Dia-1.6B")

def generate_speech(text):
    audio = model.generate(text)
    return (24000, audio)

demo = gr.Interface(
    fn=generate_speech,
    inputs=gr.Textbox(label="对话（使用 [S1], [S2] 标签）", lines=10),
    outputs=gr.Audio(label="Generated Speech"),
    title="Dia TTS — 多说话人对话"
)
demo.launch(server_port=7860)
```

## 使用场景

* **播客生成**：从剧本创建对话式播客
* **有声书对话**：为角色生成具有不同声音的对话
* **游戏对话**：为 NPC 生成具有自然语音模式的对话
* **训练数据**：为 ASR 训练生成多样化语音数据集
* **聊天机器人语音**：具有情感回应的多轮对话

## 给 Clore.ai 用户的提示

* **RTX 3080 是理想选择**：10GB 显存可轻松运行 Dia，费用约 $0.2–0.5/天
* **批量生成**：在循环中处理多个对话以最大化租赁时间
* **将模型保存到持久存储**：如果您的 Clore 实例有持久磁盘，请缓存模型以避免重新下载
* **温度 0.7–0.9**：较低 = 更一致，较高 = 更有表现力/多样化
* **仅限英文**：Dia 目前专注于英语 — 若需多语言，请参见 Qwen3-TTS 指南

## # 使用固定种子以获得一致结果

| 问题                       | 解决方案                                                   |
| ------------------------ | ------------------------------------------------------ |
| CUDA 内存不足（out of memory） | 使用 `model.to("cuda", torch_dtype=torch.float16)` 用于半精度 |
| 说话人听起来相似                 | 为每位说话人添加更多文本/上下文；尝试更高的温度                               |
| 非语言提示被忽略                 | 确保格式正确： `（笑）`, `（叹气）` 放在括号内                            |
| 音频质量低                    | 增加 `num_steps` 参数（如果可用）；确保 24kHz 采样率                   |

## 延伸阅读

* [Nari Labs GitHub](https://github.com/nari-labs/dia)
* [HuggingFace 模型](https://huggingface.co/nari-labs/Dia-1.6B)
* [对比：Dia vs ElevenLabs](https://nari-labs.github.io/dia/) — 官方演示页面


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/yin-pin-yu-yu-yin/dia-tts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.