# TTS Engine Comparison

Compare the leading open-source text-to-speech engines for deployment on Clore.ai GPU servers.

{% hint style="info" %}
**Text-to-Speech (TTS)** converts written text into natural-sounding audio. This guide compares five leading open-source TTS engines: XTTS v2, Bark, Kokoro, Fish Speech, and MeloTTS — covering quality, speed, language support, and voice cloning capabilities.
{% endhint %}

***

## Quick Decision Matrix

|                   | XTTS v2               | Bark              | Kokoro      | Fish Speech  | MeloTTS     |
| ----------------- | --------------------- | ----------------- | ----------- | ------------ | ----------- |
| **Developer**     | Coqui AI              | Suno AI           | Hexgrad     | Fish Audio   | MyShell AI  |
| **Quality**       | ⭐⭐⭐⭐⭐                 | ⭐⭐⭐⭐              | ⭐⭐⭐⭐        | ⭐⭐⭐⭐⭐        | ⭐⭐⭐         |
| **Speed**         | Medium                | Slow              | **Fast**    | **Fast**     | **Fastest** |
| **Voice cloning** | ✅ (3s clip)           | ✅ (voice presets) | ✅ (limited) | ✅ (10s clip) | ❌           |
| **Languages**     | 17                    | 10+               | English     | 8+           | 8           |
| **Min VRAM**      | 4GB                   | 8GB               | **CPU ok**  | 4GB          | **CPU ok**  |
| **License**       | CPML (non-commercial) | MIT               | Apache 2.0  | CC BY-NC-SA  | MIT         |
| **GitHub stars**  | 35K+ (Coqui TTS)      | 38K+              | 12K+        | 14K+         | 15K+        |

***

## Overview

### XTTS v2

Coqui's XTTS v2 is the gold standard for open-source voice cloning TTS. It can clone any voice from a 3-second audio clip with exceptional fidelity.

**Philosophy**: Maximum expressiveness and voice cloning quality.

```python
from TTS.api import TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

# Zero-shot voice cloning from 3s reference
tts.tts_to_file(
    text="Hello, this is a cloned voice speaking naturally.",
    speaker_wav="reference_voice.wav",
    language="en",
    file_path="output.wav"
)
```

### Bark

Suno's Bark is a transformer-based TTS model that generates highly expressive speech, including non-speech sounds: laughter, sighs, music, and sound effects.

**Philosophy**: Not just speech — full audio generation.

```python
from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav

preload_models()

audio_array = generate_audio(
    "[laughs] Hello! [clears throat] This is Bark TTS. [sighs]"
)
write_wav("output.wav", SAMPLE_RATE, audio_array)
```

### Kokoro

Kokoro is a lightweight, fast TTS model optimized for English. Despite its small size (\~82M parameters), it delivers surprisingly high quality.

**Philosophy**: Small model, big quality, runs anywhere.

```python
from kokoro import KPipeline
import soundfile as sf

pipeline = KPipeline(lang_code='a')  # 'a' = American English

generator = pipeline(
    "The quick brown fox jumps over the lazy dog.",
    voice='af_heart',  # pre-built voice
    speed=1.0,
)

for _, _, audio in generator:
    sf.write('output.wav', audio, 24000)
```

### Fish Speech

Fish Audio's Fish Speech is a production-grade TTS with exceptional voice cloning from short clips. It uses a novel codec + language model architecture.

**Philosophy**: Production quality, fast inference, excellent cloning.

```python
# Fish Speech via HTTP API
import requests

response = requests.post(
    "http://localhost:8080/v1/tts",
    json={
        "text": "Hello, this is Fish Speech generating audio.",
        "reference_id": "your-voice-id",
        "format": "wav",
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)
```

### MeloTTS

MyShell's MeloTTS is ultra-fast, multi-accent TTS optimized for real-time applications. It runs efficiently on CPU and supports multiple English accents and Asian languages.

**Philosophy**: Real-time speed at any scale.

```python
from melo.api import TTS

speed = 1.0
device = 'auto'

model = TTS(language='EN', device=device)
speaker_ids = model.hps.data.spk2id

output_path = 'output.wav'
model.tts_to_file(
    "Hello world! MeloTTS is very fast.",
    speaker_ids['EN-Default'],
    output_path,
    speed=speed
)
```

***

## Quality Comparison

### Naturalness Scores (MOS — Mean Opinion Score, 1-5)

{% hint style="info" %}
MOS scores are approximate values based on published papers and community evaluations. Actual quality depends heavily on text content and voice configuration.
{% endhint %}

| Model       | English MOS | Multilingual MOS | Expressiveness |
| ----------- | ----------- | ---------------- | -------------- |
| XTTS v2     | 4.3         | 4.1              | ⭐⭐⭐⭐⭐          |
| Bark        | 3.9         | 3.7              | ⭐⭐⭐⭐⭐ (unique) |
| Kokoro      | 4.2         | N/A (EN only)    | ⭐⭐⭐            |
| Fish Speech | 4.4         | 4.2              | ⭐⭐⭐⭐           |
| MeloTTS     | 3.8         | 3.6              | ⭐⭐             |

### What Each Model Does Best

| Model       | Standout Quality Feature                    |
| ----------- | ------------------------------------------- |
| XTTS v2     | Near-perfect voice cloning, emotional range |
| Bark        | Non-speech sounds, laughter, music, effects |
| Kokoro      | Best quality-to-size ratio, natural cadence |
| Fish Speech | Best overall naturalness + cloning accuracy |
| MeloTTS     | Consistent, clean output for long texts     |

***

## Speed Benchmarks

### Characters Per Second (CPU vs GPU)

Test: "The quick brown fox jumps over the lazy dog. How are you today?" (60 chars)

| Model       | CPU Speed         | GPU Speed (RTX 3080) | Real-time Factor |
| ----------- | ----------------- | -------------------- | ---------------- |
| XTTS v2     | \~15 chars/s      | \~150 chars/s        | 0.3× (GPU)       |
| Bark        | \~5 chars/s       | \~40 chars/s         | 0.1× (GPU)       |
| Kokoro      | \~200 chars/s     | \~800 chars/s        | **5× (GPU)**     |
| Fish Speech | \~80 chars/s      | \~500 chars/s        | **3× (GPU)**     |
| MeloTTS     | **\~500 chars/s** | \~2000 chars/s       | **12× (GPU)**    |

*Real-time factor > 1.0 means faster than playback speed*

### Time to Generate 1 Minute of Audio

| Model       | CPU      | RTX 3080 | A100    |
| ----------- | -------- | -------- | ------- |
| XTTS v2     | \~8 min  | \~30s    | \~10s   |
| Bark        | \~20 min | \~3 min  | \~45s   |
| Kokoro      | \~20s    | \~5s     | \~2s    |
| Fish Speech | \~45s    | \~8s     | \~3s    |
| MeloTTS     | **\~8s** | **\~2s** | **<1s** |

{% hint style="success" %}
**For real-time applications**: MeloTTS and Kokoro are the clear winners. Both can generate speech faster than playback speed even on CPU.
{% endhint %}

***

## Language Support

### Supported Languages

| Model       | Languages | Notable                                                            |
| ----------- | --------- | ------------------------------------------------------------------ |
| XTTS v2     | 17        | EN, ES, FR, DE, IT, PT, PL, TR, RU, NL, CS, AR, ZH, JA, HU, KO, HI |
| Bark        | 10+       | EN, ZH, FR, DE, HI, IT, JA, KO, PL, PT, RU, ES, TR                 |
| Kokoro      | 2         | English (US/UK), Japanese (limited)                                |
| Fish Speech | 8         | EN, ZH, JA, KO, FR, DE, AR, ES                                     |
| MeloTTS     | 8         | EN (4 accents), ES, FR, ZH, JA, KO                                 |

### Language Quality Notes

| Model       | English   | Chinese  | Japanese | European  |
| ----------- | --------- | -------- | -------- | --------- |
| XTTS v2     | Excellent | Good     | Good     | Excellent |
| Bark        | Good      | Fair     | Fair     | Good      |
| Kokoro      | Excellent | ❌        | Limited  | ❌         |
| Fish Speech | Excellent | **Best** | Good     | Good      |
| MeloTTS     | Good      | Good     | Good     | Good      |

{% hint style="info" %}
**For Chinese TTS**: Fish Speech and MeloTTS are the best open-source options. Both handle tones and characters naturally.

**For multilingual applications**: XTTS v2 supports the most languages with consistent quality across all of them.
{% endhint %}

***

## Voice Cloning Comparison

### Cloning Capabilities

| Model       | Reference Length   | Cloning Quality | Zero-Shot |
| ----------- | ------------------ | --------------- | --------- |
| XTTS v2     | **3 seconds**      | ⭐⭐⭐⭐⭐           | ✅         |
| Bark        | Voice presets only | ⭐⭐⭐             | Partial   |
| Kokoro      | Not supported      | ❌               | ❌         |
| Fish Speech | 10 seconds         | ⭐⭐⭐⭐⭐           | ✅         |
| MeloTTS     | Not supported      | ❌               | ❌         |

### XTTS v2 Voice Cloning

```python
from TTS.api import TTS
import torch

# Load model
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
tts.to("cuda" if torch.cuda.is_available() else "cpu")

# Clone voice from reference (minimum 3 seconds, ideal 10-30 seconds)
tts.tts_to_file(
    text="""
    Welcome to our podcast. Today we're discussing the future of AI 
    and its impact on society. I'm your host, and I'm excited to 
    share some fascinating insights with you.
    """,
    speaker_wav="speaker_sample.wav",  # Your reference audio
    language="en",
    file_path="cloned_voice_output.wav"
)
```

### Fish Speech Voice Cloning

```bash
# Clone from reference audio
fish_speech_cli tts \
  --text "This is my cloned voice speaking a new sentence." \
  --reference-audio speaker_sample.wav \
  --reference-text "The original text spoken in the reference audio." \
  --output cloned_output.wav
```

### Bark Voice Presets

```python
from bark import generate_audio, SAMPLE_RATE
from scipy.io.wavfile import write

# Bark uses predefined speaker codes
voice_presets = {
    "male_US": "v2/en_speaker_6",
    "female_US": "v2/en_speaker_9",
    "male_UK": "v2/en_speaker_0",
    "announcer": "v2/en_speaker_2",
}

audio = generate_audio(
    "Welcome! [laughs] This is absolutely fascinating technology.",
    history_prompt=voice_presets["female_US"]
)
write("bark_output.wav", SAMPLE_RATE, audio)
```

***

## XTTS v2: Deep Dive

### Architecture

* **VITS + GPT** hybrid architecture
* Trained on 16K+ hours across 17 languages
* 3-second minimum for zero-shot cloning

### Installation on Clore.ai

```bash
pip install TTS
# GPU version
pip install TTS[all]
```

### Docker Deployment

```dockerfile
FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y python3 python3-pip git ffmpeg
RUN pip3 install TTS fastapi uvicorn

WORKDIR /app
COPY server.py .

EXPOSE 5002
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "5002"]
```

```python
# server.py — XTTS v2 REST API
from fastapi import FastAPI, UploadFile, Form
from fastapi.responses import FileResponse
from TTS.api import TTS
import tempfile, os

app = FastAPI()
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

@app.post("/tts")
async def synthesize(
    text: str = Form(...),
    language: str = Form("en"),
    speaker_file: UploadFile = None
):
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as out:
        output_path = out.name

    speaker_path = None
    if speaker_file:
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as ref:
            ref.write(await speaker_file.read())
            speaker_path = ref.name

    tts.tts_to_file(
        text=text,
        speaker_wav=speaker_path,
        language=language,
        file_path=output_path
    )
    return FileResponse(output_path, media_type="audio/wav")
```

```bash
docker build -t xtts-server .
docker run -d --gpus all -p 5002:5002 xtts-server
```

**Weaknesses**: CPML license (non-commercial without permission), slower than Kokoro/MeloTTS

***

## Bark: Deep Dive

### Architecture

* **GPT-style transformer** for audio token generation
* Three-stage process: text → semantic → coarse → fine tokens
* Generates actual audio codec tokens (EnCodec)

### What Makes Bark Unique

Bark is the only open-source TTS that natively generates:

* 🎵 Background music within speech
* 😂 Laughter, sighs, throat-clearing
* 🎭 Multiple speakers in one generation
* 🌍 Mixed-language utterances

### Markup Language

```python
from bark import generate_audio, SAMPLE_RATE
from scipy.io.wavfile import write

# Special tokens for expressiveness
text = """
[clears throat] Good morning everyone. [laughs] 
Today's presentation will cover... 
[sighs deeply] ...actually quite a lot of ground.
[music: upbeat jazz] Let's get started!
"""

audio = generate_audio(text, history_prompt="v2/en_speaker_6")
write("output.wav", SAMPLE_RATE, audio)
```

### Installation

```bash
pip install git+https://github.com/suno-ai/bark.git
```

**Weaknesses**: Slow (3-stage pipeline), inconsistent across runs, no true voice cloning

***

## Kokoro: Deep Dive

### Architecture

* **82M parameter** StyleTTS2-based model
* Extremely small but surprisingly high quality
* Fast inference on CPU and GPU

### Voices Available

```python
from kokoro import KPipeline

pipeline = KPipeline(lang_code='a')  # 'a' = American English, 'b' = British

# Available voices
voices = {
    'af_heart': 'American Female (warm)',
    'af_bella': 'American Female (bella)',
    'af_nicole': 'American Female (nicole)',
    'am_michael': 'American Male (michael)',
    'am_fenrir': 'American Male (fenrir)',
    'bf_emma': 'British Female (emma)',
    'bm_george': 'British Male (george)',
}

# Generate with different voices
for voice_name, description in voices.items():
    gen = pipeline("Hello, this is a test.", voice=voice_name)
    for _, _, audio in gen:
        print(f"Generated with {description}")
```

### Streaming Support

```python
import sounddevice as sd
from kokoro import KPipeline

pipeline = KPipeline(lang_code='a')

# Stream audio in real-time as it generates
text = "This is a very long text that will be streamed as it generates, providing low-latency audio output."

for _, _, audio in pipeline(text, voice='af_heart'):
    sd.play(audio, samplerate=24000)
    sd.wait()
```

**Weaknesses**: English only (primarily), no voice cloning, limited expressiveness

***

## Fish Speech: Deep Dive

### Architecture

* **VQGAN + Language Model** architecture
* Trained on 700K+ hours of audio
* Strong multilingual with Asian language support

### Installation

```bash
pip install fish-speech

# Or via Docker
docker run -d \
  --gpus all \
  -p 8080:8080 \
  fishaudio/fish-speech:latest \
  +api_server.workers_count=1
```

### Python API

```python
import httpx
import base64

# Via HTTP API
with httpx.Client() as client:
    response = client.post(
        "http://localhost:8080/v1/tts",
        json={
            "text": "Hello from Fish Speech! This sounds very natural.",
            "format": "wav",
            "mp3_bitrate": 128,
            "normalize": True,
        }
    )
    
    with open("fish_output.wav", "wb") as f:
        f.write(response.content)
```

### Voice Cloning

```python
# Upload reference, get back voice ID
with open("my_voice.wav", "rb") as f:
    response = httpx.post(
        "http://localhost:8080/v1/voices",
        files={"file": f},
        data={"text": "The text spoken in this recording."}
    )
    voice_id = response.json()["id"]

# Use cloned voice
response = httpx.post(
    "http://localhost:8080/v1/tts",
    json={
        "text": "Now speaking in the cloned voice.",
        "reference_id": voice_id,
    }
)
```

**Weaknesses**: CC BY-NC-SA license (non-commercial), higher VRAM for best quality

***

## MeloTTS: Deep Dive

### Architecture

* **VITS2-based** architecture
* Multi-accent English training
* Extremely optimized for inference speed

### Accents and Languages

```python
from melo.api import TTS

# Supported language codes and accents
configs = {
    'EN':    ['EN-Default', 'EN-US', 'EN-BR', 'EN-INDIA', 'EN-AU'],
    'ES':    ['ES'],
    'FR':    ['FR'],
    'ZH':    ['ZH'],
    'JP':    ['JP'],
    'KR':    ['KR'],
}

model = TTS(language='EN', device='cuda')
speaker_ids = model.hps.data.spk2id

# Generate with British accent
model.tts_to_file(
    "Cheerio! Fancy a spot of tea?",
    speaker_ids['EN-BR'],
    'british.wav'
)

# Generate with Indian accent
model.tts_to_file(
    "Namaste! Welcome to our company.",
    speaker_ids['EN-INDIA'],
    'indian.wav'
)
```

### Batch Processing (Very Fast)

```python
from melo.api import TTS
import time

model = TTS(language='EN', device='cuda')
sid = model.hps.data.spk2id['EN-Default']

texts = [
    "First sentence to synthesize.",
    "Second sentence goes here.",
    "Third and final sentence.",
]

start = time.time()
for i, text in enumerate(texts):
    model.tts_to_file(text, sid, f'output_{i}.wav')
elapsed = time.time() - start
print(f"Generated {len(texts)} files in {elapsed:.2f}s")
```

**Weaknesses**: No voice cloning, robotic at high speed, limited expressiveness

***

## Deployment on Clore.ai

### All-in-One TTS Server

```yaml
# docker-compose.yml — TTS service with multiple backends
version: "3.8"

services:
  xtts:
    build:
      context: ./xtts
    ports:
      - "5002:5002"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    volumes:
      - ./voices:/app/voices

  kokoro:
    image: ghcr.io/remsky/kokoro-fastapi-cpu:latest
    ports:
      - "8880:8880"
    # No GPU needed!

  fish-speech:
    image: fishaudio/fish-speech:latest
    ports:
      - "8080:8080"
    command: +api_server.workers_count=2
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

### VRAM Requirements Summary

| Model       | CPU           | 4GB GPU | 8GB GPU | 16GB GPU |
| ----------- | ------------- | ------- | ------- | -------- |
| XTTS v2     | Slow          | ✅       | ✅       | ✅        |
| Bark        | Very slow     | ❌       | ✅       | ✅        |
| Kokoro      | **Fast**      | ✅       | ✅       | ✅        |
| Fish Speech | Medium        | ✅       | ✅       | ✅        |
| MeloTTS     | **Very fast** | ✅       | ✅       | ✅        |

***

## Integration Examples

### OpenAI-Compatible API (for drop-in replacement)

```python
# Many TTS servers offer OpenAI-compatible endpoints
# Use Kokoro FastAPI or similar

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8880/v1",  # Your TTS server
    api_key="not-needed"
)

response = client.audio.speech.create(
    model="kokoro",
    voice="af_heart",
    input="Hello world! This uses the OpenAI TTS API format.",
)
response.stream_to_file("output.mp3")
```

### LangChain Integration

```python
# Using TTS with LangChain for voice agents
from langchain_community.tools import Tool
from TTS.api import TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

def speak(text: str) -> str:
    tts.tts_to_file(text=text, language="en", file_path="/tmp/response.wav")
    return "/tmp/response.wav"

tts_tool = Tool(
    name="text_to_speech",
    func=speak,
    description="Convert text to speech audio file"
)
```

***

## When to Use Which

### Decision Guide

```
Need voice cloning from short clip?
  → XTTS v2 (3s reference) or Fish Speech (10s reference)

Need real-time/fastest generation?
  → MeloTTS (CPU friendly) or Kokoro

Need expressive speech (laughter, emotion)?
  → Bark (unique non-speech sounds) or XTTS v2

Need Chinese/Japanese/Korean?
  → Fish Speech (best CJK) or MeloTTS

English only, maximum quality?
  → Kokoro (best size/quality ratio)

Need 17+ languages?
  → XTTS v2

Commercial use allowed?
  → Kokoro (Apache) or MeloTTS (MIT) or Bark (MIT)

Non-commercial research?
  → Any (XTTS v2 CPML or Fish Speech CC BY-NC-SA)
```

### By Application Type

| Application          | Best Choice            | Why                       |
| -------------------- | ---------------------- | ------------------------- |
| Audiobook generation | XTTS v2                | Natural, consistent voice |
| Real-time chatbot    | MeloTTS or Kokoro      | Fastest inference         |
| Podcast automation   | XTTS v2 or Fish Speech | Best cloning              |
| Game characters      | Bark                   | Expressive, varied voices |
| Customer service     | MeloTTS                | Scalable, fast            |
| Accessibility tools  | Kokoro                 | Lightweight, free         |
| Voice dubbing        | Fish Speech            | Best cloning quality      |
| Long-form narration  | XTTS v2                | Consistent quality        |

***

## License Summary

{% hint style="warning" %}
**License matters for commercial use!** Always check before deploying in production.
{% endhint %}

| Model       | License                    | Commercial? | Notes                           |
| ----------- | -------------------------- | ----------- | ------------------------------- |
| XTTS v2     | Coqui Public Model License | ❌ Free      | Requires license for commercial |
| Bark        | MIT                        | ✅           | Free for all use                |
| Kokoro      | Apache 2.0                 | ✅           | Free for all use                |
| Fish Speech | CC BY-NC-SA 4.0            | ❌           | Non-commercial only             |
| MeloTTS     | MIT                        | ✅           | Free for all use                |

**Fully open for commercial use**: Bark, Kokoro, MeloTTS

***

## Cost on Clore.ai

```
Kokoro/MeloTTS (CPU or cheap GPU):
  Cheapest server at ~$0.05/hr → ~$36/month
  Can handle 100+ concurrent requests on CPU

XTTS v2 (RTX 3080):
  ~$0.30/hr → ~$220/month
  ~500 requests/hr capacity

Fish Speech (RTX 4090):
  ~$0.60/hr → ~$440/month  
  ~1000 requests/hr capacity
```

***

## Useful Links

* [Coqui TTS (XTTS)](https://github.com/coqui-ai/TTS) — 35K+ stars
* [Bark GitHub](https://github.com/suno-ai/bark) — 38K+ stars
* [Kokoro GitHub](https://github.com/hexgrad/kokoro) — 12K+ stars
* [Fish Speech GitHub](https://github.com/fishaudio/fish-speech) — 14K+ stars
* [MeloTTS GitHub](https://github.com/myshell-ai/MeloTTS) — 15K+ stars
* [TTS Arena Leaderboard](https://huggingface.co/spaces/TTS-AGI/TTS-Arena)

***

## Summary

| Model           | Use When                                                  |
| --------------- | --------------------------------------------------------- |
| **XTTS v2**     | Best voice cloning (3s ref), 17 languages, non-commercial |
| **Bark**        | Expressive, laughter/effects, MIT license                 |
| **Kokoro**      | Fast, high-quality English, Apache license                |
| **Fish Speech** | Best CJK, production cloning, non-commercial              |
| **MeloTTS**     | Fastest, real-time, multi-accent English, MIT license     |

For most production Clore.ai deployments:

* **Real-time voice apps** → MeloTTS or Kokoro (free, fast, MIT)
* **Voice cloning service** → XTTS v2 or Fish Speech (check licensing)
* **Expressive narration** → Bark or XTTS v2

***

## Clore.ai GPU Recommendations

| Use Case            | Recommended GPU | Est. Cost on Clore.ai |
| ------------------- | --------------- | --------------------- |
| Development/Testing | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| Production          | RTX 4090 (24GB) | \~$0.70/gpu/hr        |
| Large Scale         | A100 80GB       | \~$1.20/gpu/hr        |

> 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/comparisons/tts-comparison.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.