# MeloTTS

MeloTTS 是一个由 开发 的高质量多语言文本转语音库， **MyShell AI**。它在多种语言和英语口音上提供快速、自然听感的语音合成，适用于研究和生产部署。MeloTTS 针对速度进行了优化——即使在 CPU 上也能显著快于实时生成语音——同时保持适用于商业使用的高音频质量。

MeloTTS 目前支持：

* **英语** （美式、英式、印度、澳大利亚、默认）
* **中文（简体及中英混合）**
* **日文**
* **韩语**
* **西班牙语**
* **法语**

主要亮点：

* ⚡ **快速推理** ——在 CPU 上快于实时，在 GPU 上极速
* 🌍 **多语言** ——6 种语言并为英语提供口音变体
* 🐳 **可直接用于 Docker** ——提供官方 Docker 镜像（可用）
* 🔌 **REST API** ——用于集成到任何应用的 HTTP API
* 📱 **生产级别** ——用于 MyShell 的消费级产品中

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行 [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

***

## 服务器要求

| 参数       | 最低要求                  | 推荐配置                   |
| -------- | --------------------- | ---------------------- |
| GPU      | NVIDIA GTX 1080（8 GB） | NVIDIA RTX 3090（24 GB） |
| 显存（VRAM） | 4 GB                  | 8–16 GB                |
| 内存（RAM）  | 8 GB                  | 16 GB                  |
| CPU      | 4 个内核                 | 8 核                    |
| 磁盘       | 10 GB                 | 20 GB                  |
| 操作系统     | Ubuntu 20.04+         | Ubuntu 22.04           |
| CUDA     | 11.7+（可选）             | 12.1+                  |
| Python   | 3.8+                  | 3.10                   |
| 端口       | 22, 8888              | 22, 8888               |

{% hint style="info" %}
MeloTTS 独具高效性——对于单次请求在 CPU 上运行良好，并在批量处理时从 GPU 中显著受益。即使是入门级 GPU 也会将吞吐量显著提升一倍以上。
{% endhint %}

***

## 在 CLORE.AI 上快速部署

{% hint style="warning" %}
**注意：** MeloTTS 在 Docker Hub 上没有官方预构建的 Docker 镜像（`myshell-ai/melotts` 不存在）。推荐的方法是使用 NVIDIA CUDA 基础镜像，并从官方 GitHub 仓库通过 pip 安装 MeloTTS。
{% endhint %}

### 1. 找到合适的服务器

前往 [CLORE.AI 市场](https://clore.ai/marketplace) 并按以下条件筛选：

* **显存（VRAM）**: ≥ 4 GB（或仅 CPU 用于低流量）
* **GPU**: 任何 NVIDIA GPU（GTX 1080 以上、RTX 系列、A100）
* **磁盘**: ≥ 10 GB

### 2. 配置您的部署

**Docker 镜像：**

```
nvidia/cuda:12.1.0-devel-ubuntu22.04
```

**端口映射：**

```
22   → SSH 访问
8888 → MeloTTS API 服务器
```

**环境变量：**

```
NVIDIA_VISIBLE_DEVICES=all
```

**启动命令** （SSH 登录服务器后运行）：

```bash
apt-get update && apt-get install -y python3-pip ffmpeg espeak-ng git && \
git clone https://github.com/myshell-ai/MeloTTS.git && \
cd MeloTTS && pip install -e . && \
python -m unidic download && \
python3 -c "import nltk; nltk.download('averaged_perceptron_tagger_eng')" && \
python -m melo.api_server --host 0.0.0.0 --port 8888
```

### 3. 访问 API

```
http://<your-clore-server-ip>:8888
```

测试命令：

```bash
curl -X POST http://<server-ip>:8888/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from Clore.ai!", "language": "EN", "speaker_id": "EN-Default"}'
```

***

## 逐步设置

### 第一步：SSH 登录到您的服务器

```bash
ssh root@<your-clore-server-ip> -p <ssh-port>
```

### 步骤 2：构建并运行容器

由于 MeloTTS 没有预构建的 Docker Hub 镜像，请使用 NVIDIA CUDA 基础镜像并从源码安装 MeloTTS：

```bash
# 运行 CUDA 容器并在其中安装 MeloTTS
docker run -d \
  --name melotts \
  --gpus all \
  -p 8888:8888 \
  -v /workspace/melotts/outputs:/app/outputs \
  -e NVIDIA_VISIBLE_DEVICES=all \
  nvidia/cuda:12.1.0-devel-ubuntu22.04 \
  bash -c "apt-get update && apt-get install -y python3-pip ffmpeg espeak-ng git && \
    git clone https://github.com/myshell-ai/MeloTTS.git /app/MeloTTS && \
    cd /app/MeloTTS && pip install -e . && \
    python -m unidic download && \
    python3 -c \"import nltk; nltk.download('averaged_perceptron_tagger_eng')\" && \
    python -m melo.api_server --host 0.0.0.0 --port 8888"
```

或者，从源码构建自定义 Docker 镜像：

```bash
git clone https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
docker build -t melotts:local .
docker run -d \
  --name melotts \
  --gpus all \
  -p 8888:8888 \
  melotts:local
```

### 步骤 3：确认服务正在运行

```bash
# 检查容器日志
docker logs -f melotts

# 等待启动，然后测试
curl http://localhost:8888/health
```

### 步骤 4：可选 —— Jupyter Notebook 界面

```bash
docker run -d \
  --name melotts-jupyter \
  --gpus all \
  -p 8888:8888 \
  nvidia/cuda:12.1.0-devel-ubuntu22.04 \
  bash -c "pip install jupyter melo-tts && \
    jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser --allow-root"
```

访问地址： `http://<server-ip>:8888`

### 步骤 5：从 pip 安装（不使用 Docker）

```bash
# 安装系统依赖项
apt-get install -y python3-pip ffmpeg espeak-ng

# 安装 MeloTTS
pip install melo-tts

# 下载所需的 NLTK 数据
python3 -c "import nltk; nltk.download('averaged_perceptron_tagger_eng')"
```

***

## 使用示例

### 示例 1：基础英文 TTS（Python）

```python
from melo.api import TTS

# 初始化英文 TTS
speed = 1.0  # 调整语速（0.5 = 慢，2.0 = 快）
device = 'cuda'  # 若无 GPU 可使用 'cpu'

tts = TTS(language='EN', device=device)

# 获取可用的说话人 ID
speakers = tts.hps.data.spk2id
print("可用说话人：", list(speakers.keys()))
# 输出：['EN-Default', 'EN-US', 'EN-GB', 'EN-India', 'EN-Australia', 'EN-Brazil']

# 生成语音
speaker_ids = tts.hps.data.spk2id
output_path = "output_english.wav"

tts.tts_to_file(
    text="Welcome to Clore.ai, your GPU cloud marketplace for AI workloads. Rent powerful GPUs in minutes.",
    speaker_id=speaker_ids['EN-Default'],
    output_path=output_path,
    speed=speed
)

print(f"已保存至： {output_path}")
```

***

### 示例 2：多语言 TTS

```python
from melo.api import TTS

device = 'cuda'

# 定义语言-文本 对列表
language_texts = [
    ('EN', 'EN-US', "GPU computing has transformed artificial intelligence research and development."),
    ('EN', 'EN-GB', "The United Kingdom leads Europe in AI investment and innovation."),
    ('ZH', 'ZH', "Clore.ai是一个去中心化的GPU云计算市场，为AI开发者提供算力服务。"),
    ('JP', 'JP', "人工知能の発展には大規模な計算資源が必要です。"),
    ('KR', 'KR', "Clore.ai는 AI 연구자를 위한 GPU 클라우드 마켓플레이스입니다."),
    ('SP', 'SP', "La inteligencia artificial está transformando todas las industrias del mundo."),
    ('FR', 'FR', "L'intelligence artificielle révolutionne la façon dont nous travaillons et vivons."),
]

for lang, speaker, text in language_texts:
    try:
        tts = TTS(language=lang, device=device)
        speaker_id = tts.hps.data.spk2id[speaker]

        output_file = f"output_{lang}_{speaker}.wav"
        tts.tts_to_file(text=text, speaker_id=speaker_id, output_path=output_file)
        print(f"✓ 已生成 [{lang}]: {output_file}")
    except Exception as e:
        print(f"✗ 错误 [{lang}]: {e}")
```

***

### 示例 3：REST API 用法

```python
import requests
import json

API_BASE = "http://<your-clore-server-ip>:8888"

# 检查可用的语音
response = requests.get(f"{API_BASE}/voices")
print("可用语音：", json.dumps(response.json(), indent=2))

# 合成语音
def synthesize(text, language="EN", speaker="EN-Default", speed=1.0):
    "batch": {
        "text": text,
        "language": language,
        "speaker_id": speaker,
        "speed": speed,
        "format": "wav"
    }

    response = requests.post(
        f"{API_BASE}/synthesize",
        json=payload,
        timeout=30
    )

    if response.status_code == 200:
        return response.content
    else:
        raise Exception(f"API error: {response.status_code} - {response.text}")

# 生成示例
samples = [
    ("Hello, this is MeloTTS running on Clore.ai GPU servers.", "EN", "EN-US"),
    ("This is the British English accent variant.", "EN", "EN-GB"),
    ("Let me demonstrate the Indian English accent.", "EN", "EN-India"),
]

for text, lang, speaker in samples:
    audio_bytes = synthesize(text, lang, speaker)
    filename = f"api_output_{speaker.replace('-', '_')}.wav"
    with open(filename, "wb") as f:
        f.write(audio_bytes)
    print(f"已保存： {filename}")
```

***

### 示例 4：高速批处理

```python
from melo.api import TTS
from concurrent.futures import ThreadPoolExecutor
import soundfile as sf
import base64
import numpy as np
from pathlib import Path

device = 'cuda'
tts = TTS(language='EN', device=device)
speaker_id = tts.hps.data.spk2id['EN-US']

# 大批量文本
texts = [
    f"This is sentence number {i}. It demonstrates fast batch processing with MeloTTS on Clore.ai GPU infrastructure."
    for i in range(1, 51)  # 50 句
]

output_dir = Path("batch_output")
output_dir.mkdir(exist_ok=True)

start_time = time.time()

# 处理批量
for i, text in enumerate(texts):
    output_path = str(output_dir / f"batch_{i+1:03d}.wav")
    tts.tts_to_file(
        text=text,
        speaker_id=speaker_id,
        output_path=output_path,
        speed=1.0,
        quiet=True
    )
    if (i + 1) % 10 == 0:
        elapsed = time.time() - start_time
        print(f"进度： {i+1}/50 | 时间： {elapsed:.1f}s | 速率： {(i+1)/elapsed:.1f} 句/秒")

total_time = time.time() - start_time
print(f"\n批处理完成： {len(texts)} 句，共耗时 {total_time:.1f}s")
print(f"平均： {total_time/len(texts)*1000:.0f}ms 每句")
```

***

### 示例 5：中英混合 TTS

```python
from melo.api import TTS

device = 'cuda'
tts = TTS(language='ZH', device=device)
speaker_id = tts.hps.data.spk2id['ZH']

# 混合语言文本（中文 + 英文）
mixed_texts = [
    "我们使用Clore.ai的GPU服务器来运行machine learning workloads。",
    "今天的AI conference讨论了large language models和speech synthesis技术。",
    "我的startup需要GPU资源来训练我们的deep learning模型。",
    "Clore.ai提供了非常competitive的价格，比AWS和GCP便宜很多。",
]

for i, text in enumerate(mixed_texts):
    output_file = f"mixed_zh_en_{i+1}.wav"
    tts.tts_to_file(
        text=text,
        speaker_id=speaker_id,
        output_path=output_file,
        speed=0.9  # 为清晰度稍微放慢一些
    )
    print(f"已生成： {output_file}")
    print(f"  文本： {text[:60]}...")
```

***

## invokeai.yaml 配置文件

### Docker Compose 设置

由于 MeloTTS 没有官方的 Docker Hub 镜像，请使用 NVIDIA CUDA 基础镜像并在启动时从源码安装 MeloTTS：

```yaml
version: '3.8'

services:
  melotts：
    image: nvidia/cuda:12.1.0-devel-ubuntu22.04
    container_name: melotts
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - PYTHONDONTWRITEBYTECODE=1
    ports:
      - "8888:8888"
    volumes:
      - ./outputs:/app/outputs
      - ./cache:/root/.cache
    command: >
      bash -c "apt-get update && apt-get install -y python3-pip ffmpeg espeak-ng git &&
      git clone https://github.com/myshell-ai/MeloTTS.git /app/MeloTTS &&
      cd /app/MeloTTS && pip install -e . &&
      python -m unidic download &&
      python3 -c 'import nltk; nltk.download(\"averaged_perceptron_tagger_eng\")' &&
      python -m melo.api_server --host 0.0.0.0 --port 8888"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8888/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

### API 配置选项

| 参数          | 默认          | 描述                        |
| ----------- | ----------- | ------------------------- |
| `--host`    | `127.0.0.1` | 绑定地址（使用 `0.0.0.0` 用于公开访问） |
| `--port`    | `8888`      | API 服务器端口                 |
| `--workers` | `1`         | 工作进程数                     |
| `--device`  | `auto`      | `cuda`, `cpu`，或 `auto`    |

### 支持的语言和说话人

| 语言   | 代码   | 说话人 ID                                                                  |
| ---- | ---- | ----------------------------------------------------------------------- |
| 英语   | `EN` | `EN-Default`, `EN-US`, `EN-GB`, `EN-India`, `EN-Australia`, `EN-Brazil` |
| 中文   | `ZH` | `ZH`                                                                    |
| 日文   | `JP` | `JP`                                                                    |
| 韩语   | `KR` | `KR`                                                                    |
| 西班牙语 | `SP` | `SP`                                                                    |
| 法语   | `FR` | `FR`                                                                    |

***

## 1. 使用 SDXL-Turbo 或 SDXL-Lightning 以实现快速生成

### 1. GPU 与 CPU 基准测试

MeloTTS 性能（RTF = 实时因子，越小越好）：

| 设备       | RTF     | 说明         |
| -------- | ------- | ---------- |
| CPU（8 核） | \~0.3x  | 快速，适合低负载   |
| RTX 3080 | \~0.05x | 比实时快 20 倍  |
| RTX 4090 | \~0.02x | 比实时快 50 倍  |
| A100     | \~0.01x | 比实时快 100 倍 |

### 2. 优化吞吐量

```python
# 在推理时禁用梯度计算
import torch

with torch.no_grad():
    tts.tts_to_file(text, speaker_id, output_path)
```

### 3. 预热模型

```python
# 运行一次预热推理以加载 CUDA 内核
tts.tts_to_file(
    text="warmup",
    speaker_id=speaker_id,
    output_path="/dev/null"
)
print("模型已预热，准备进行快速推理")
```

### 4. 在音质与速度之间调整

```python
# 更快（质量略低）
tts.tts_to_file(text, speaker_id, output_path, speed=1.2)

# 更慢的语速（更好发音）
tts.tts_to_file(text, speaker_id, output_path, speed=0.8)
```

### 5. 内存效率

```python
# 在大批次之间释放 GPU 内存
import gc
import torch

gc.collect()
torch.cuda.empty_cache()
```

***

## 故障排除

### 问题： `espeak-ng` 未找到

```bash
apt-get install -y espeak-ng
python3 -c "import phonemizer; print('phonemizer OK')"
```

### 问题：缺少 NLTK 数据

```bash
python3 -c "
import nltk
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('punkt')
"
```

### 问题：端口 8888 与 Jupyter 冲突

MeloTTS 默认使用端口 8888，这会与 Jupyter Notebook 冲突。解决方案：

```bash
# 方案 1：在不同端口运行 MeloTTS
python -m melo.api_server --host 0.0.0.0 --port 8889

# 方案 2：在不同端口运行 Jupyter
jupyter notebook --port 8890
```

### 问题：中文文本显示不正确

```bash
# 安装中文语言支持
pip install jieba
apt-get install -y python3-opencc

# 测试
python3 -c "from melo.api import TTS; t = TTS('ZH'); print('ZH OK')"
```

### 问题：Docker 镜像拉取失败

```bash
# 改为从源码构建
git clone https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
pip install -e .
python3 -c "import nltk; nltk.download('averaged_perceptron_tagger_eng')"
```

### 问题：GPU 上推理速度慢

```bash
# 验证 GPU 是否被使用
python3 -c "
import torch
from melo.api import TTS
tts = TTS('EN', device='cuda')
print(f'Device: {next(tts.model.parameters()).device}')
print(f'CUDA available: {torch.cuda.is_available()}')
"
```

***

## Clore.ai 的 GPU 建议

MeloTTS 体积轻量——在低流量场景下可在 CPU 上良好运行，并随 GPU 计算能力线性扩展。你不需要昂贵的硬件。

| GPU       | 显存（VRAM） | Clore.ai 价格 | RTF（实时因子）         | 容量         |
| --------- | -------- | ----------- | ----------------- | ---------- |
| 仅 CPU     | —        | ≈$0.02/小时   | \~0.3×            | \~3 次/分钟   |
| RTX 3090  | 24 GB    | \~$0.12/小时  | \~0.02×（50× 实时）   | \~100 次/分钟 |
| RTX 4090  | 24 GB    | \~$0.70/小时  | \~0.01×（100× 实时）  | \~200 次/分钟 |
| A100 40GB | 40 GB    | \~$1.20/小时  | \~0.005×（200× 实时） | \~400 次/分钟 |

{% hint style="info" %}
**TTS 工作负载的最佳性价比：** RTX 3090 在 ~~$0.12/小时 可提供 50× 实时的 TTS 速度。对于为数百名用户提供服务的生产 API，这已绰绰有余。仅 CPU 实例（~~$0.02/小时）适用于开发和低流量部署。
{% endhint %}

**生产推荐：** 对于为 10–50 个并发用户提供多语言 TTS API，RTX 3090 是最佳选择。建议横向扩展（多实例）而非升级到昂贵的 A100——MeloTTS 从高端 GPU 中并不会成比例获益。

***

## 文档

* **GitHub**: <https://github.com/myshell-ai/MeloTTS>
* **Docker**: 无官方 Docker Hub 镜像——请从 [GitHub 源码](https://github.com/myshell-ai/MeloTTS) 使用 `nvidia/cuda:12.1.0-devel-ubuntu22.04` 基础镜像
* **论文**: <https://arxiv.org/abs/2406.06753>
* **Hugging Face**: <https://huggingface.co/myshell-ai/MeloTTS-English>
* **MyShell AI**: <https://myshell.ai>
* **CLORE.AI 市场**: <https://clore.ai/marketplace>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/yin-pin-yu-yu-yin/melotts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.