# Overview

AI-powered audio processing, speech synthesis, and voice cloning on CLORE.AI GPUs.

## Text-to-Speech

| Tool                                                            | Description                                 | Quality   |
| --------------------------------------------------------------- | ------------------------------------------- | --------- |
| [Bark TTS](/guides/audio-and-voice/bark-tts.md)                 | Expressive multilingual TTS                 | Excellent |
| [XTTS](/guides/audio-and-voice/xtts-coqui.md)                   | Voice cloning + TTS                         | Great     |
| [F5-TTS](/guides/audio-and-voice/f5-tts.md)                     | Fast zero-shot TTS                          | Great     |
| [OpenVoice](/guides/audio-and-voice/openvoice-clone.md)         | Instant voice cloning                       | Good      |
| [Chatterbox TTS](/guides/audio-and-voice/chatterbox-tts.md)     | Zero-shot voice cloning                     | Good      |
| [ChatTTS](/guides/audio-and-voice/chattts.md)                   | Conversational text-to-speech               | Good      |
| [Dia TTS](/guides/audio-and-voice/dia-tts.md)                   | Multi-speaker dialog generation             | Good      |
| [Fish Speech](/guides/audio-and-voice/fish-speech.md)           | High-quality voice synthesis                | Great     |
| [Kani-TTS-2](/guides/audio-and-voice/kani-tts.md)               | Efficient voice cloning TTS                 | Good      |
| [Kokoro TTS](/guides/audio-and-voice/kokoro-tts.md)             | Ultra-fast lightweight TTS                  | Good      |
| [MeloTTS](/guides/audio-and-voice/melotts.md)                   | Multilingual text-to-speech                 | Good      |
| [MiniMax Speech 2.6](/guides/audio-and-voice/minimax-speech.md) | Commercial-grade TTS                        | Great     |
| [Qwen3-TTS](/guides/audio-and-voice/qwen3-tts.md)               | Multilingual voice cloning                  | Good      |
| [StyleTTS2](/guides/audio-and-voice/styletss2.md)               | Style-controllable TTS                      | Great     |
| [Voxtral TTS](/guides/audio-and-voice/voxtral-tts.md)           | Open-weight 4B TTS, 9 languages, 3s cloning | Excellent |
| [Zonos TTS](/guides/audio-and-voice/zonos-tts.md)               | Voice cloning with emotion control          | Good      |

## Voice Cloning

| Tool                                                    | Training Required | Quality   |
| ------------------------------------------------------- | ----------------- | --------- |
| [RVC](/guides/audio-and-voice/rvc-voice-clone.md)       | Yes               | Excellent |
| [OpenVoice](/guides/audio-and-voice/openvoice-clone.md) | No                | Good      |
| [XTTS](/guides/audio-and-voice/xtts-coqui.md)           | No (6 sec sample) | Great     |

## Audio Processing

| Tool                                                        | Use Case                                      |
| ----------------------------------------------------------- | --------------------------------------------- |
| [Whisper](/guides/audio-and-voice/whisper-transcription.md) | Speech-to-text transcription                  |
| [Demucs](/guides/audio-and-voice/demucs-separation.md)      | Vocal separation                              |
| [AudioCraft](/guides/audio-and-voice/audiocraft-music.md)   | Music generation                              |
| [Stable Audio](/guides/audio-and-voice/stable-audio.md)     | AI music and sound generation                 |
| [WhisperX](/guides/audio-and-voice/whisperx.md)             | Fast transcription with word-level timestamps |

## Related Guides

* [Talking Heads](/guides/talking-heads/talking-heads.md) - Animate faces with audio


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/audio-and-voice/audio-voice.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
