> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-hi/audio-and-voice/chatterbox-tts.md).

# Chatterbox Voice Cloning

Chatterbox एक उन्नत ओपन-सोर्स टेक्स्ट-टु-स्पीच मॉडल परिवार है जिसे [Resemble AI](https://resemble.ai)द्वारा विकसित किया गया है। यह एक छोटे संदर्भ क्लिप (\~10 सेकंड) से ज़ीरो-शॉट वॉइस क्लोनिंग करता है, पैरालिंग्विस्टिक टैग्स का समर्थन करता है जैसे `[laugh]` और `[cough]`और 23+ भाषाओं को कवर करने वाला एक बहुभाषी वेरिएंट प्रदान करता है। तीन मॉडल वेरिएंट उपलब्ध हैं: Turbo (350M, कम-लेटेंसी), Original (500M, रचनात्मक नियंत्रण), और Multilingual (500M, 23+ भाषाएँ)।

**GitHub:** [resemble-ai/chatterbox](https://github.com/resemble-ai/chatterbox) **PyPI:** [chatterbox-tts](https://pypi.org/project/chatterbox-tts/) **लाइसेंस:** MIT

## प्रमुख विशेषताएँ

* **जीरो-शॉट वॉइस क्लोनिंग** — लगभग 10 सेकंड के संदर्भ ऑडियो से किसी भी आवाज़ को क्लोन करें
* **पैरालिंग्विस्टिक टैग्स** (Turbo) — `[laugh]`, `[cough]`, `[chuckle]`, `[sigh]` वास्तविक-सदृश बोलचाल के लिए
* **23+ भाषाएँ** (Multilingual) — अरबी, चीनी, फ्रेंच, जर्मन, जापानी, कोरियाई, रूसी, स्पेनिश, और और भी कई
* **CFG और अतिशयोक्ति ट्यूनिंग** (Original) — अभिव्यक्ति पर रचनात्मक नियंत्रण
* **तीन मॉडल आकार** — Turbo (350M), Original (500M), Multilingual (500M)
* **MIT लाइसेंस** — वाणिज्यिक उपयोग के लिए पूरी तरह खुला

## आवश्यकताएँ

| घटक    | न्यूनतम        | अनुशंसित            |
| ------ | -------------- | ------------------- |
| GPU    | RTX 3060 12 GB | RTX 3090 / RTX 4090 |
| VRAM   | 6 GB           | 10 GB+              |
| RAM    | 8 GB           | 16 GB               |
| डिस्क  | 5 GB           | 15 GB               |
| Python | 3.10+          | 3.11                |
| CUDA   | 11.8+          | 12.1+               |

**Clore.ai सिफारिश:** RTX 3090 (~~आरामदायक VRAM हेडरूम के लिए ($0.30–1.00/दिन)। Turbo मॉडल के लिए RTX 3060 काम करता है। लंबी टेक्स्ट के साथ Multilingual मॉडल के लिए एक RTX 4090 पर विचार करें (~~$0.50–2.00/दिन)।

## इंस्टॉलेशन

```bash
# PyPI से इंस्टॉल करें
pip install chatterbox-tts

# या स्रोत से इंस्टॉल करें
git clone https://github.com/resemble-ai/chatterbox.git
cd chatterbox
pip install -e .

# सत्यापित करें
python -c "from chatterbox.tts import ChatterboxTTS; print('Chatterbox ready')"
```

## त्वरित प्रारम्भ

### Turbo मॉडल (न्यूनतम विलंबता)

```python
import torchaudio as ta
from chatterbox.tts_turbo import ChatterboxTurboTTS

model = ChatterboxTurboTTS.from_pretrained(device="cuda")

# पैरालिंग्विस्टिक टैग्स के साथ बुनियादी TTS
text = "Hey, welcome back! [chuckle] I've got some great news for you today."

# वॉइस क्लोनिंग — 10+ सेकंड का संदर्भ क्लिप प्रदान करें
wav = model.generate(text, audio_prompt_path="reference_voice.wav")

ta.save("output_turbo.wav", wav, model.sr)
print(f"Saved at {model.sr} Hz")
```

### Original मॉडल (अंग्रेज़ी, रचनात्मक नियंत्रण)

```python
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS

model = ChatterboxTTS.from_pretrained(device="cuda")

text = "The quick brown fox jumps over the lazy dog. It was a beautiful morning."

# वॉइस क्लोनिंग के बिना जनरेट करें (डिफ़ॉल्ट आवाज़ का उपयोग करता है)
wav = model.generate(text)
ta.save("output_default.wav", wav, model.sr)

# वॉइस क्लोनिंग के साथ जनरेट करें
wav = model.generate(text, audio_prompt_path="my_voice_sample.wav")
ta.save("output_cloned.wav", wav, model.sr)
```

## उपयोग के उदाहरण

### बहुभाषी वॉइस क्लोनिंग

```python
import torchaudio as ta
from chatterbox.mtl_tts import ChatterboxMultilingualTTS

model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")

# फ्रेंच
french_text = "Bonjour, comment allez-vous? Bienvenue dans notre démonstration."
wav_fr = model.generate(french_text, language_id="fr")
ta.save("output_french.wav", wav_fr, model.sr)

# जापानी
japanese_text = "こんにちは、テキスト読み上げのデモンストレーションです。"
wav_ja = model.generate(japanese_text, language_id="ja")
ta.save("output_japanese.wav", wav_ja, model.sr)

# रूसी वॉइस क्लोनिंग के साथ
russian_text = "Привет! Это демонстрация синтеза речи на русском языке."
wav_ru = model.generate(
    russian_text,
    language_id="ru",
    audio_prompt_path="russian_speaker.wav"
)
ta.save("output_russian.wav", wav_ru, model.sr)

print("Multilingual generation complete")
```

### पैरालिंग्विस्टिक टैग्स (Turbo)

```python
import torchaudio as ta
from chatterbox.tts_turbo import ChatterboxTurboTTS

model = ChatterboxTurboTTS.from_pretrained(device="cuda")

samples = [
    ("greeting", "Hi there! [laugh] It's so good to see you again."),
    ("nervous", "Um, well [cough] I'm not really sure about that."),
    ("excited", "Oh my gosh! [chuckle] That's absolutely incredible news!"),
]

for name, text in samples:
    wav = model.generate(text, audio_prompt_path="speaker_ref.wav")
    ta.save(f"para_{name}.wav", wav, model.sr)
    print(f"Generated: {name}")
```

### बैच प्रोसेसिंग स्क्रिप्ट

```python
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS
import os

model = ChatterboxTTS.from_pretrained(device="cuda")

# पंक्तियों की एक सूची को प्रोसेस करें (उदा., ऑडियोबुक अध्यायों के लिए)
lines = [
    "Chapter one. The adventure begins.",
    "It was a dark and stormy night.",
    "The hero stood at the crossroads, uncertain of the path ahead.",
]

os.makedirs("output_batch", exist_ok=True)

for i, line in enumerate(lines):
    wav = model.generate(line, audio_prompt_path="narrator_voice.wav")
    ta.save(f"output_batch/line_{i:03d}.wav", wav, model.sr)
    print(f"[{i+1}/{len(lines)}] {line[:40]}...")

print("Batch processing complete")
```

## Clore.ai उपयोगकर्ताओं के लिए सुझाव

* **मॉडल चयन** — कम-लेटेंसी वॉइस एजेंट्स के लिए Turbo का उपयोग करें, अंग्रेज़ी रचनात्मक कार्य के लिए Original, गैर-अंग्रेज़ी सामग्री के लिए Multilingual
* **संदर्भ ऑडियो की गुणवत्ता** — सर्वश्रेष्ठ वॉइस क्लोनिंग परिणामों के लिए एक साफ़, बिना शोर का 10–30 सेकंड क्लिप उपयोग करें
* **Docker सेटअप** — बेस इमेज `pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime`, पोर्ट एक्सपोज़ करें `7860/http` Gradio के लिए
* **मेमोरी प्रबंधन** — कॉल करें `torch.cuda.empty_cache()` बड़े बैचों के बीच VRAM खाली करने के लिए
* **समर्थित भाषाएँ** — ar, da, de, el, en, es, fi, fr, he, hi, it, ja, ko, ms, nl, no, pl, pt, ru, sv, sw, tr, zh
* **HuggingFace Space** — किराये पर लेने से पहले यहाँ आज़माएँ [huggingface.co/spaces/ResembleAI/Chatterbox](https://huggingface.co/spaces/ResembleAI/Chatterbox)

## समस्याओं का निवारण

| समस्या                            | समाधान                                                                                                     |
| --------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| `CUDA में आउट ऑफ मेमोरी`          | Original/Multilingual (500M) के बजाय Turbo (350M) का उपयोग करें, या बड़ा GPU किराये पर लें                 |
| क्लोन की गई आवाज़ मेल नहीं खा रही | 15–30s लंबा, साफ़ संदर्भ क्लिप उपयोग करें जिसमें पृष्ठभूमि शोर न्यूनतम हो                                  |
| `numpy` संस्करण संघर्ष            | चलाएँ `pip install numpy==1.26.4 --force-reinstall`                                                        |
| मॉडल डाउनलोड धीमा है              | मॉडल पहली बार चलाने पर HuggingFace से लाए जाते हैं (\~2 GB); पहले से डाउनलोड करने के लिए `huggingface-cli` |
| ऑडियो में कलाकृतियाँ हैं          | प्रति जनरेशन टेक्स्ट लंबाई घटाएँ; बहुत लंबे टेक्स्ट गुणवत्ता को बिगाड़ सकते हैं                            |
| `ModuleNotFoundError`             | सुनिश्चित करें `pip install chatterbox-tts` बिना त्रुटियों के पूरा हुआ; Python 3.11 संगतता जांचें          |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-hi/audio-and-voice/chatterbox-tts.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.