# Dia TTS (Nari Labs)

Dia by Nari Labs is an advanced text-to-speech model that specializes in **realistic multi-speaker dialogue**. Unlike traditional TTS that handles one speaker at a time, Dia generates natural conversations between multiple speakers with emotion, laughter, hesitation, and other non-verbal cues. At 1.6B parameters, it runs on any 8GB+ GPU.

## Key Features

* **Multi-speaker dialog**: Generate conversations between 2+ speakers in one pass
* **Non-verbal cues**: Laughter `(laughs)`, hesitation `(sighs)`, pauses — automatically embedded
* **Emotional speech**: Natural intonation without explicit emotion tags
* **1.6B parameters**: Fits on RTX 3070/3080 (8-10GB VRAM)
* **Apache 2.0 license**: Full commercial use
* **HuggingFace integration**: Works with Transformers library

## Requirements

| Component | Minimum        | Recommended     |
| --------- | -------------- | --------------- |
| GPU       | RTX 3070 (8GB) | RTX 3080 (10GB) |
| VRAM      | 8GB            | 10GB+           |
| RAM       | 16GB           | 32GB            |
| Disk      | 10GB           | 15GB            |
| Python    | 3.9+           | 3.11            |

**Recommended Clore.ai GPU**: RTX 3080 10GB (\~$0.2–0.5/day)

## Installation

```bash
# Option 1: pip install
pip install dia-tts

# Option 2: From source
git clone https://github.com/nari-labs/dia.git
cd dia
pip install -e .
```

## Quick Start

### Basic Multi-Speaker Dialog

```python
from dia import Dia

# Load model
model = Dia.from_pretrained("nari-labs/Dia-1.6B")

# Generate multi-speaker conversation
# [S1] = Speaker 1, [S2] = Speaker 2
text = """[S1] Hey, have you tried the new GPU rental platform?
[S2] You mean Clore? Yeah, I rented an RTX 4090 yesterday.
[S1] How was it?
[S2] (laughs) Honestly? Way cheaper than I expected. Like two bucks a day.
[S1] No way. That's... that's actually insane."""

audio = model.generate(text)

# Save to file
import soundfile as sf
sf.write("dialog.wav", audio, samplerate=24000)
```

### With Emotion and Non-Verbal Cues

```python
# Dia automatically handles natural speech patterns
text = """[S1] I just got the results back...
[S2] And? Don't keep me in suspense!
[S1] (sighs) We passed. We actually passed all the tests.
[S2] (laughs) I told you! I told you we'd make it!
[S1] I can't believe it... (laughs) okay, okay, let's celebrate."""

audio = model.generate(text, temperature=0.8)
sf.write("emotional_dialog.wav", audio, samplerate=24000)
```

### Single Speaker

```python
# Works for single speaker too
text = "[S1] Welcome to the Clore AI documentation. In this guide, we'll walk through setting up your first GPU rental and deploying a machine learning model."

audio = model.generate(text)
sf.write("narration.wav", audio, samplerate=24000)
```

## Gradio Web UI

```python
# Launch interactive demo
python -m dia.app --port 7860 --share

# Or manually:
import gradio as gr
from dia import Dia

model = Dia.from_pretrained("nari-labs/Dia-1.6B")

def generate_speech(text):
    audio = model.generate(text)
    return (24000, audio)

demo = gr.Interface(
    fn=generate_speech,
    inputs=gr.Textbox(label="Dialog (use [S1], [S2] tags)", lines=10),
    outputs=gr.Audio(label="Generated Speech"),
    title="Dia TTS — Multi-Speaker Dialog"
)
demo.launch(server_port=7860)
```

## Use Cases

* **Podcast generation**: Create conversational podcasts from scripts
* **Audiobook dialogs**: Generate character conversations with distinct voices
* **Game dialogue**: NPC conversations with natural speech patterns
* **Training data**: Generate diverse speech datasets for ASR training
* **Chatbot voices**: Multi-turn dialog with emotional responses

## Tips for Clore.ai Users

* **RTX 3080 is ideal**: 10GB VRAM handles Dia easily at \~$0.2–0.5/day
* **Batch generation**: Process multiple dialogs in a loop to maximize your rental time
* **Save models to persistent storage**: If your Clore instance has persistent disk, cache the model to avoid re-downloading
* **Temperature 0.7–0.9**: Lower = more consistent, higher = more expressive/varied
* **English only**: Dia currently focuses on English — for multilingual, see Qwen3-TTS guide

## Troubleshooting

| Issue                   | Solution                                                              |
| ----------------------- | --------------------------------------------------------------------- |
| CUDA out of memory      | Use `model.to("cuda", torch_dtype=torch.float16)` for half precision  |
| Speakers sound similar  | Add more text/context per speaker; try higher temperature             |
| Non-verbal cues ignored | Ensure correct format: `(laughs)`, `(sighs)` in parentheses           |
| Audio quality low       | Increase `num_steps` parameter if available; ensure 24kHz sample rate |

## Further Reading

* [Nari Labs GitHub](https://github.com/nari-labs/dia)
* [HuggingFace Model](https://huggingface.co/nari-labs/Dia-1.6B)
* [Comparison: Dia vs ElevenLabs](https://nari-labs.github.io/dia/) — official demo page


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/audio-and-voice/dia-tts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
