# Bark TTS

Generate realistic speech and audio with Bark AI.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Server Requirements

| Parameter    | Minimum     | Recommended   |
| ------------ | ----------- | ------------- |
| RAM          | 8GB         | 16GB+         |
| VRAM         | 4GB (small) | 8GB+ (normal) |
| Network      | 200Mbps     | 500Mbps+      |
| Startup Time | 3-5 minutes | -             |

{% hint style="warning" %}
**Startup Time:** First launch downloads Bark models (3-5 minutes depending on network speed). HTTP 502 during this time is normal.
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is Bark?

Bark by Suno AI can generate:

* Realistic speech in multiple languages
* Various speaker voices
* Non-verbal sounds (laughing, sighing)
* Music and sound effects
* Multilingual speech

## Requirements

| Quality | VRAM | Recommended |
| ------- | ---- | ----------- |
| Small   | 4GB  | RTX 3060    |
| Normal  | 8GB  | RTX 3070    |
| High    | 12GB | RTX 3090    |

## Quick Deploy

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install git+https://github.com/suno-ai/bark.git gradio scipy && \
python -c "
import gradio as gr
from bark import SAMPLE_RATE, generate_audio, preload_models
import scipy.io.wavfile as wav
import numpy as np
import tempfile

preload_models()

def generate(text, voice):
    audio = generate_audio(text, history_prompt=voice)
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        wav.write(f.name, SAMPLE_RATE, (audio * 32767).astype(np.int16))
        return f.name

voices = ['v2/en_speaker_0', 'v2/en_speaker_1', 'v2/en_speaker_2', 'v2/en_speaker_3',
          'v2/en_speaker_4', 'v2/en_speaker_5', 'v2/en_speaker_6', 'v2/en_speaker_7',
          'v2/en_speaker_8', 'v2/en_speaker_9']

demo = gr.Interface(fn=generate, inputs=[gr.Textbox(lines=5), gr.Dropdown(voices)],
                   outputs=gr.Audio(), title='Bark TTS')
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

### Verify It's Working

```bash
# Check if Gradio UI is accessible
curl https://your-http-pub.clorecloud.net/
```

{% hint style="warning" %}
If you get HTTP 502, wait 3-5 minutes - the service is downloading models.
{% endhint %}

## Installation

```bash
pip install git+https://github.com/suno-ai/bark.git
pip install scipy
```

## Basic Usage

```python
from bark import SAMPLE_RATE, generate_audio, preload_models
import scipy.io.wavfile as wav
import numpy as np

# Preload models (downloads on first run)
preload_models()

# Generate audio
text = "Hello, this is a test of Bark text to speech."
audio = generate_audio(text)

# Save as WAV
wav.write("output.wav", SAMPLE_RATE, (audio * 32767).astype(np.int16))
```

## Voice Selection

### Built-in Voices

```python

# English speakers (0-9)
audio = generate_audio("Hello!", history_prompt="v2/en_speaker_0")
audio = generate_audio("Hello!", history_prompt="v2/en_speaker_3")
audio = generate_audio("Hello!", history_prompt="v2/en_speaker_9")

# Other languages
audio = generate_audio("Bonjour!", history_prompt="v2/fr_speaker_0")  # French
audio = generate_audio("Hallo!", history_prompt="v2/de_speaker_0")    # German
audio = generate_audio("Hola!", history_prompt="v2/es_speaker_0")     # Spanish
audio = generate_audio("Ciao!", history_prompt="v2/it_speaker_0")     # Italian
audio = generate_audio("Olá!", history_prompt="v2/pt_speaker_0")      # Portuguese
audio = generate_audio("Привет!", history_prompt="v2/ru_speaker_0")   # Russian
audio = generate_audio("こんにちは!", history_prompt="v2/ja_speaker_0") # Japanese
audio = generate_audio("你好!", history_prompt="v2/zh_speaker_0")      # Chinese
```

### Available Languages

| Language   | Code | Speakers |
| ---------- | ---- | -------- |
| English    | en   | 0-9      |
| German     | de   | 0-9      |
| Spanish    | es   | 0-9      |
| French     | fr   | 0-9      |
| Hindi      | hi   | 0-9      |
| Italian    | it   | 0-9      |
| Japanese   | ja   | 0-9      |
| Korean     | ko   | 0-9      |
| Polish     | pl   | 0-9      |
| Portuguese | pt   | 0-9      |
| Russian    | ru   | 0-9      |
| Turkish    | tr   | 0-9      |
| Chinese    | zh   | 0-9      |

## Non-Verbal Sounds

Bark can generate non-verbal audio:

```python

# Laughter
audio = generate_audio("Hello! [laughs] That's so funny!")

# Sighing
audio = generate_audio("[sighs] I'm so tired today.")

# Gasping
audio = generate_audio("[gasps] Oh my god!")

# Clearing throat
audio = generate_audio("[clears throat] Ahem, attention please.")

# Music notes
audio = generate_audio("♪ La la la ♪")
```

## Long-Form Audio

For text longer than 13 seconds:

```python
from bark import generate_audio
from bark.generation import SAMPLE_RATE
import numpy as np

def generate_long_audio(text, voice="v2/en_speaker_6"):
    # Split into sentences
    sentences = text.replace(".", ".|").replace("?", "?|").replace("!", "!|").split("|")
    sentences = [s.strip() for s in sentences if s.strip()]

    audio_segments = []
    for sentence in sentences:
        audio = generate_audio(sentence, history_prompt=voice)
        audio_segments.append(audio)
        # Add small pause between sentences
        audio_segments.append(np.zeros(int(0.25 * SAMPLE_RATE)))

    return np.concatenate(audio_segments)

long_text = """
This is a longer piece of text that will be split into multiple segments.
Each segment will be generated separately. Then they will be concatenated.
This allows for generating audio of any length.
"""

audio = generate_long_audio(long_text)
```

## Voice Cloning

Create custom voice prompts:

```python
from bark.generation import preload_models, generate_text_semantic
from bark.api import semantic_to_waveform
from bark import generate_audio, SAMPLE_RATE
import numpy as np

# Generate with specific characteristics

# The prompt can include speaker description

# First, generate a reference
voice_prompt = "v2/en_speaker_6"
text = "This is how I sound when I speak normally."
audio = generate_audio(text, history_prompt=voice_prompt)

# Save as custom voice (simplified example)
np.savez("custom_voice.npz", audio=audio)
```

## Batch Processing

```python
import os
from bark import generate_audio, SAMPLE_RATE
import scipy.io.wavfile as wav
import numpy as np

texts = [
    "Welcome to our podcast.",
    "Today we'll discuss artificial intelligence.",
    "Let's get started with the introduction.",
]

output_dir = "./audio_clips"
os.makedirs(output_dir, exist_ok=True)

voice = "v2/en_speaker_6"

for i, text in enumerate(texts):
    print(f"Generating {i+1}/{len(texts)}")
    audio = generate_audio(text, history_prompt=voice)
    wav.write(
        os.path.join(output_dir, f"clip_{i:03d}.wav"),
        SAMPLE_RATE,
        (audio * 32767).astype(np.int16)
    )
```

## API Server

```python
from fastapi import FastAPI
from fastapi.responses import FileResponse
from bark import generate_audio, preload_models, SAMPLE_RATE
import scipy.io.wavfile as wav
import numpy as np
import tempfile
import os

app = FastAPI()
preload_models()

@app.post("/generate")
async def generate_speech(text: str, voice: str = "v2/en_speaker_6"):
    audio = generate_audio(text, history_prompt=voice)

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        wav.write(f.name, SAMPLE_RATE, (audio * 32767).astype(np.int16))
        return FileResponse(f.name, media_type="audio/wav")

# Run: uvicorn server:app --host 0.0.0.0 --port 8000
```

### Usage

```bash
curl -X POST "http://localhost:8000/generate?text=Hello%20world&voice=v2/en_speaker_6" \
    --output speech.wav
```

## Memory Optimization

### For Limited VRAM

```python
import os

# Use smaller model
os.environ["SUNO_USE_SMALL_MODELS"] = "1"

# Offload to CPU
os.environ["SUNO_OFFLOAD_CPU"] = "1"

from bark import generate_audio
audio = generate_audio("Hello world")
```

### Enable FP16

```python
os.environ["SUNO_ENABLE_MPS"] = "0"

from bark import generate_audio
audio = generate_audio("Hello!", history_prompt="v2/en_speaker_6")
```

## Combining with Other Audio

```python
from pydub import AudioSegment
import numpy as np
from bark import generate_audio, SAMPLE_RATE
import scipy.io.wavfile as wav
import tempfile

def bark_to_pydub(audio_array):
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        wav.write(f.name, SAMPLE_RATE, (audio_array * 32767).astype(np.int16))
        return AudioSegment.from_wav(f.name)

# Generate speech
speech = generate_audio("Welcome to the show!")
speech_audio = bark_to_pydub(speech)

# Load background music
music = AudioSegment.from_mp3("background.mp3")

# Mix together
music = music - 20  # Lower music volume
combined = speech_audio.overlay(music)
combined.export("output.mp3", format="mp3")
```

## Performance

| Mode   | GPU      | Time (10 words) |
| ------ | -------- | --------------- |
| Normal | RTX 3090 | \~5s            |
| Normal | RTX 4090 | \~3s            |
| Small  | RTX 3060 | \~8s            |
| CPU    | -        | \~60s           |

## Comparison with Other TTS

| Feature    | Bark | Coqui  | Piper |
| ---------- | ---- | ------ | ----- |
| Quality    | Best | Great  | Good  |
| Speed      | Slow | Medium | Fast  |
| Languages  | 13+  | 20+    | 30+   |
| Non-verbal | Yes  | No     | No    |
| VRAM       | 8GB+ | 4GB    | 1GB   |

## Troubleshooting

### Out of Memory

```python

# Use small models
os.environ["SUNO_USE_SMALL_MODELS"] = "1"
os.environ["SUNO_OFFLOAD_CPU"] = "1"
```

### Slow Generation

* Use GPU (not CPU)
* Keep models loaded between generations
* Generate shorter segments

### Audio Quality Issues

* Try different speakers
* Break long text into sentences
* Avoid special characters

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* [RVC Voice Cloning](https://docs.clore.ai/guides/audio-and-voice/rvc-voice-clone)
* [Whisper Transcription](https://docs.clore.ai/guides/audio-and-voice/whisper-transcription)
* [AudioCraft Music](https://docs.clore.ai/guides/audio-and-voice/audiocraft-music)
