# Bark TTS

使用 Bark AI 生成逼真的语音和音频。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 服务器要求

| 参数   | 最低      | 推荐        |
| ---- | ------- | --------- |
| 内存   | 8GB     | 16GB+     |
| 显存   | 4GB（小型） | 8GB以上（正常） |
| 网络   | 200Mbps | 500Mbps+  |
| 启动时间 | 3-5 分钟  | -         |

{% hint style="warning" %}
**启动时间：** 首次启动会下载 Bark 模型（根据网络速度需 3-5 分钟）。此期间出现 HTTP 502 属正常现象。
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 什么是 Bark？

Suno AI 的 Bark 可以生成：

* 多语言的逼真语音
* 多种说话人音色
* 非语言声音（笑声、叹气）
* 音乐和音效
* 多语种语音

## 要求

| 质量 | 显存   | 推荐       |
| -- | ---- | -------- |
| 小  | 4GB  | 按小时费率    |
| 正常 | 8GB  | RTX 3070 |
| 高  | 12GB | 速度       |

## 快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
pip install git+https://github.com/suno-ai/bark.git gradio scipy && \
python -c "
print(f"已生成：{name}")
from bark import SAMPLE_RATE, generate_audio, preload_models
import scipy.io.wavfile as wav
import numpy as np
import tempfile

preload_models()

def generate(text, voice):
    audio = generate_audio(text, history_prompt=voice)
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        wav.write(f.name, SAMPLE_RATE, (audio * 32767).astype(np.int16))
        return f.name

voices = ['v2/en_speaker_0', 'v2/en_speaker_1', 'v2/en_speaker_2', 'v2/en_speaker_3',
          'v2/en_speaker_4', 'v2/en_speaker_5', 'v2/en_speaker_6', 'v2/en_speaker_7',
          'v2/en_speaker_8', 'v2/en_speaker_9']

demo = gr.Interface(fn=generate, inputs=[gr.Textbox(lines=5), gr.Dropdown(voices)],
                   outputs=gr.Audio(), title='Bark TTS')
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

### 验证是否正常运行

```bash
# 检查 Gradio 界面是否可访问
curl https://your-http-pub.clorecloud.net/
```

{% hint style="warning" %}
如果出现 HTTP 502，请等待 3-5 分钟——服务正在下载模型。
{% endhint %}

## 安装

```bash
pip install git+https://github.com/suno-ai/bark.git
pip install scipy
```

## 基本用法

```python
from bark import SAMPLE_RATE, generate_audio, preload_models
import scipy.io.wavfile as wav
import numpy as np

# 预加载模型（首次运行时下载）
preload_models()

# 生成音频
text = "Hello, this is a test of Bark text to speech."
audio = generate_audio(text)

# 保存为 WAV
wav.write("output.wav", SAMPLE_RATE, (audio * 32767).astype(np.int16))
```

## 语音选择

### 内置声音

```python

# 英语说话人（0-9）
audio = generate_audio("Hello!", history_prompt="v2/en_speaker_0")
audio = generate_audio("Hello!", history_prompt="v2/en_speaker_3")
audio = generate_audio("Hello!", history_prompt="v2/en_speaker_9")

# 其他语言
audio = generate_audio("Bonjour!", history_prompt="v2/fr_speaker_0")  # 法语
audio = generate_audio("Hallo!", history_prompt="v2/de_speaker_0")    # 德语
audio = generate_audio("Hola!", history_prompt="v2/es_speaker_0")     # 西班牙语
audio = generate_audio("Ciao!", history_prompt="v2/it_speaker_0")     # 意大利语
audio = generate_audio("Olá!", history_prompt="v2/pt_speaker_0")      # 葡萄牙语
audio = generate_audio("Привет!", history_prompt="v2/ru_speaker_0")   # 俄语
audio = generate_audio("こんにちは!", history_prompt="v2/ja_speaker_0") # 日语
audio = generate_audio("你好!", history_prompt="v2/zh_speaker_0")      # 中文
```

### 可用语言

| 语言   | 代码 | 说话人 |
| ---- | -- | --- |
| 英语   | en | 0-9 |
| 德语   | de | 0-9 |
| 西班牙语 | es | 0-9 |
| 法语   | fr | 0-9 |
| 印地语  | hi | 0-9 |
| 意大利语 | it | 0-9 |
| 日语   | ja | 0-9 |
| 韩语   | ko | 0-9 |
| 波兰语  | pl | 0-9 |
| 葡萄牙语 | pt | 0-9 |
| 俄语   | ru | 0-9 |
| 土耳其语 | tr | 0-9 |
| 中文   | zh | 0-9 |

## 非语言声音

Bark 可以生成非语言音频：

```python

# 笑声
audio = generate_audio("Hello! [laughs] That's so funny!")

# 叹气
audio = generate_audio("[sighs] I'm so tired today.")

# 喘气
audio = generate_audio("[gasps] Oh my god!")

# 清嗓子
audio = generate_audio("[clears throat] Ahem, attention please.")

# 音符
audio = generate_audio("♪ La la la ♪")
```

## 长篇音频

对于超过 13 秒的文本：

```python
from bark import generate_audio
from bark.generation import SAMPLE_RATE
import numpy as np

def generate_long_audio(text, voice="v2/en_speaker_6"):
    # 拆分为句子
    sentences = text.replace(".", ".|").replace("?", "?|").replace("!", "!|").split("|")
    sentences = [s.strip() for s in sentences if s.strip()]

    audio_segments = []
    for sentence in sentences:
        audio = generate_audio(sentence, history_prompt=voice)
        audio_segments.append(audio)
        # 在句子间添加小停顿
        audio_segments.append(np.zeros(int(0.25 * SAMPLE_RATE)))

    return np.concatenate(audio_segments)

long_text = """
这是一段较长的文本，将被拆分为多个片段。
每个片段将分别生成，然后拼接在一起。
这允许生成任意长度的音频。
"""

audio = generate_long_audio(long_text)
```

## 语音克隆

创建自定义声音提示：

```python
from bark.generation import preload_models, generate_text_semantic
from bark.api import semantic_to_waveform
from bark import generate_audio, SAMPLE_RATE
import numpy as np

# 按特定特征生成

# 提示可包含说话人描述

# 首先，生成参考音频
voice_prompt = "v2/en_speaker_6"
text = "This is how I sound when I speak normally."
audio = generate_audio(text, history_prompt=voice_prompt)

# 保存为自定义声音（简化示例）
np.savez("custom_voice.npz", audio=audio)
```

## "专业影棚柔光箱"

```python
批处理处理
from bark import generate_audio, SAMPLE_RATE
import scipy.io.wavfile as wav
import numpy as np

texts = [
    "欢迎收听我们的播客。",
    "今天我们将讨论人工智能。",
    "让我们从介绍开始。",
]

output_dir = "./audio_clips"
output_dir = "./relit"

voice = "v2/en_speaker_6"

for i, text in enumerate(texts):
    print(f"Generating {i+1}/{len(texts)}")
    audio = generate_audio(text, history_prompt=voice)
    wav.write(
        os.path.join(output_dir, f"clip_{i:03d}.wav"),
        SAMPLE_RATE,
        (audio * 32767).astype(np.int16)
    )
```

## API 服务器

```python
from fastapi import FastAPI
from fastapi.responses import FileResponse
from bark import generate_audio, preload_models, SAMPLE_RATE
import scipy.io.wavfile as wav
import numpy as np
import tempfile
批处理处理

app = FastAPI()
preload_models()

@app.post("/generate")
async def generate_speech(text: str, voice: str = "v2/en_speaker_6"):
    audio = generate_audio(text, history_prompt=voice)

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        wav.write(f.name, SAMPLE_RATE, (audio * 32767).astype(np.int16))
        return FileResponse(f.name, media_type="audio/wav")

# 运行：uvicorn server:app --host 0.0.0.0 --port 8000
```

### 用法

```bash
curl -X POST "http://localhost:8000/generate?text=Hello%20world&voice=v2/en_speaker_6" \
    --output speech.wav
```

## 内存优化

### 针对有限显存

```python
批处理处理

# 使用更小的模型
os.environ["SUNO_USE_SMALL_MODELS"] = "1"

# 卸载到 CPU
os.environ["SUNO_OFFLOAD_CPU"] = "1"

from bark import generate_audio
audio = generate_audio("Hello world")
```

### 启用 FP16

```python
os.environ["SUNO_ENABLE_MPS"] = "0"

from bark import generate_audio
audio = generate_audio("Hello!", history_prompt="v2/en_speaker_6")
```

## 与其他音频结合

```python
from pydub import AudioSegment
import numpy as np
from bark import generate_audio, SAMPLE_RATE
import scipy.io.wavfile as wav
import tempfile

def bark_to_pydub(audio_array):
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        wav.write(f.name, SAMPLE_RATE, (audio_array * 32767).astype(np.int16))
        return AudioSegment.from_wav(f.name)

# 生成语音
speech = generate_audio("Welcome to the show!")
speech_audio = bark_to_pydub(speech)

# 加载背景音乐
music = AudioSegment.from_mp3("background.mp3")

# 混合在一起
music = music - 20  # 降低音乐音量
combined = speech_audio.overlay(music)
combined.export("output.mp3", format="mp3")
```

## background = Image.open("studio\_bg.jpg")

| 模式  | GPU     | 时间（10 个词） |
| --- | ------- | --------- |
| 正常  | 速度      | \~5s      |
| 正常  | 512x512 | \~3s      |
| 小   | 按小时费率   | \~8s      |
| CPU | -       | \~60 秒    |

## 与其他 TTS 的比较

| 特性  | Bark | Coqui | Piper |
| --- | ---- | ----- | ----- |
| 质量  | 最佳   | 很棒    | 良好    |
| 性能  | 慢    | 中等    | 快速    |
| 语言  | 13+  | 20+   | 30+   |
| 非语言 | 是    | 否     | 否     |
| 显存  | 8GB+ | 4GB   | 1GB   |

## # 使用固定种子以获得一致结果

### 内存不足

```python

# 使用小模型
os.environ["SUNO_USE_SMALL_MODELS"] = "1"
os.environ["SUNO_OFFLOAD_CPU"] = "1"
```

### 生成速度慢

* 使用 GPU（而非 CPU）
* 在多次生成之间保持模型加载
* 生成更短的片段

### 音频质量问题

* 尝试不同的说话人
* 将长文本拆分为句子
* 避免特殊字符

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## 使用以下方式支付

* [RVC 语音克隆](/guides/guides_v2-zh/yin-pin-yu-yu-yin/rvc-voice-clone.md)
* [Whisper 转录](/guides/guides_v2-zh/yin-pin-yu-yu-yin/whisper-transcription.md)
* [AudioCraft Music](/guides/guides_v2-zh/yin-pin-yu-yu-yin/audiocraft-music.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/yin-pin-yu-yu-yin/bark-tts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.