F5-TTS

Generate natural speech with F5-TTS - a fast and fluent TTS system.

circle-check

Renting on CLORE.AI

  1. Filter by GPU type, VRAM, and price

  2. Choose On-Demand (fixed rate) or Spot (bid price)

  3. Configure your order:

    • Select Docker image

    • Set ports (TCP for SSH, HTTP for web UIs)

    • Add environment variables if needed

    • Enter startup command

  4. Select payment: CLORE, BTC, or USDT/USDC

  5. Create order and wait for deployment

Access Your Server

  • Find connection details in My Orders

  • Web interfaces: Use the HTTP port URL

  • SSH: ssh -p <port> root@<proxy-address>

What is F5-TTS?

F5-TTS offers:

  • Fast inference (faster than real-time)

  • Natural prosody and intonation

  • Zero-shot voice cloning

  • Multi-language support

Resources

Component
Minimum
Recommended
Optimal

GPU

RTX 3060 12GB

RTX 4080 16GB

RTX 4090 24GB

VRAM

6GB

12GB

16GB

CPU

4 cores

8 cores

16 cores

RAM

16GB

32GB

64GB

Storage

20GB SSD

50GB NVMe

100GB NVMe

Internet

100 Mbps

500 Mbps

1 Gbps

Quick Deploy on CLORE.AI

Docker Image:

Ports:

Command:

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Installation

What You Can Create

Voice Content

  • Podcast production

  • Audiobook narration

  • Voice-over for videos

Accessibility

  • Screen readers

  • Document readers

  • Learning materials

Interactive Applications

  • Voice assistants

  • Gaming NPCs

  • Customer service bots

Creative Projects

  • Character voices

  • Audio dramas

  • Music vocals

Basic Usage

Simple TTS

Voice Cloning

Multi-Language Support

Batch Processing

Long-Form Audio

Gradio Interface

API Server

Performance

Text Length
GPU
Generation Time
Real-time Factor

100 chars

RTX 3090

0.5s

5x

100 chars

RTX 4090

0.3s

8x

500 chars

RTX 4090

1.2s

10x

1000 chars

A100

2.0s

12x

Common Problems & Solutions

Poor Voice Match

Problem: Generated voice doesn't match reference

Solutions:

  • Use 5-15 seconds of clear reference audio

  • Provide accurate reference text transcription

  • Avoid background noise in reference

  • Match language of text and reference

Pronunciation Issues

Problem: Mispronounces words or names

Solutions:

Audio Quality Issues

Problem: Output sounds robotic or distorted

Solutions:

  • Use high-quality reference audio (24kHz+)

  • Clean reference from noise

  • Try different reference samples

  • Increase generation quality settings

Memory Issues

Problem: Out of memory for long texts

Solutions:

Slow Generation

Problem: Takes too long to generate

Solutions:

  • Use GPU inference (CUDA)

  • Reduce chunk_size for faster processing

  • Use RTX 4090 or better

  • Enable half-precision (fp16)

Troubleshooting

Voice doesn't match reference

  • Use 5-15 seconds of clear reference audio

  • Transcribe reference text accurately

  • Avoid background noise in reference

Audio quality issues

  • Use high sample rate reference (24kHz+)

  • Clean reference from noise

  • Try different reference samples

Slow generation

  • Use CUDA (not CPU)

  • Reduce text length or chunk it

  • Use smaller batch sizes

Language mismatch

  • Match text language with reference audio language

  • Some languages need specific models

Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

GPU
Hourly Rate
Daily Rate
4-Hour Session

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Prices vary by provider and demand. Check CLORE.AI Marketplacearrow-up-right for current rates.

Save money:

  • Use Spot market for flexible workloads (often 30-50% cheaper)

  • Pay with CLORE tokens

  • Compare prices across different providers

Next Steps

Last updated

Was this helpful?