XTTS (Coqui)

Generate natural speech with voice cloning using Coqui XTTS.

circle-check

Renting on CLORE.AI

  1. Filter by GPU type, VRAM, and price

  2. Choose On-Demand (fixed rate) or Spot (bid price)

  3. Configure your order:

    • Select Docker image

    • Set ports (TCP for SSH, HTTP for web UIs)

    • Add environment variables if needed

    • Enter startup command

  4. Select payment: CLORE, BTC, or USDT/USDC

  5. Create order and wait for deployment

Access Your Server

  • Find connection details in My Orders

  • Web interfaces: Use the HTTP port URL

  • SSH: ssh -p <port> root@<proxy-address>

What is XTTS?

XTTS (by Coqui) offers:

  • High-quality text-to-speech

  • Voice cloning from 6 seconds of audio

  • 17 languages supported

  • Emotional control

  • Streaming support

Requirements

Mode
VRAM
Recommended

Inference

4GB

RTX 3060

Fast Inference

6GB

RTX 3080

Streaming

4GB

RTX 3060

Quick Deploy

Docker Image:

Ports:

Command:

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Installation

Basic Usage

Simple TTS

Voice Cloning

Multiple Languages

Supported Languages

Code
Language

en

English

es

Spanish

fr

French

de

German

it

Italian

pt

Portuguese

pl

Polish

tr

Turkish

ru

Russian

nl

Dutch

cs

Czech

ar

Arabic

zh-cn

Chinese

ja

Japanese

hu

Hungarian

ko

Korean

hi

Hindi

Streaming TTS

Gradio Interface

API Server

Batch Processing

Fine-tuning Voice

For better voice cloning:

Audio Preprocessing

Performance

Mode
GPU
Speed

Standard

RTX 3060

~0.5x realtime

Standard

RTX 4090

~2x realtime

Streaming

RTX 3060

~1x realtime

Streaming

RTX 4090

~3x realtime

Quality Tips

  • Use 6-15 seconds of clean reference audio

  • Avoid background noise in reference

  • Match language of text and reference

  • Use multiple reference samples for better results

Troubleshooting

Poor Voice Quality

  • Clean reference audio

  • Longer reference (10+ seconds)

  • Match speaking style

Wrong Language Pronunciation

  • Ensure correct language code

  • Use native speaker reference

Slow Generation

  • Enable GPU inference

  • Use streaming mode

  • Reduce text length per call

Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

GPU
Hourly Rate
Daily Rate
4-Hour Session

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Prices vary by provider and demand. Check CLORE.AI Marketplacearrow-up-right for current rates.

Save money:

  • Use Spot market for flexible workloads (often 30-50% cheaper)

  • Pay with CLORE tokens

  • Compare prices across different providers

Next Steps

Last updated

Was this helpful?