RVC Voice Clone

Clone and convert voices using Retrieval-based Voice Conversion.

circle-check

Renting on CLORE.AI

  1. Filter by GPU type, VRAM, and price

  2. Choose On-Demand (fixed rate) or Spot (bid price)

  3. Configure your order:

    • Select Docker image

    • Set ports (TCP for SSH, HTTP for web UIs)

    • Add environment variables if needed

    • Enter startup command

  4. Select payment: CLORE, BTC, or USDT/USDC

  5. Create order and wait for deployment

Access Your Server

  • Find connection details in My Orders

  • Web interfaces: Use the HTTP port URL

  • SSH: ssh -p <port> root@<proxy-address>

What is RVC?

RVC (Retrieval-based Voice Conversion) can:

  • Clone any voice with minimal training

  • Convert singing/speaking voices

  • Real-time voice conversion

  • High-quality output

Requirements

Task
Min VRAM
Recommended

Inference

4GB

RTX 3060

Training

8GB

RTX 3090

Real-time

6GB

RTX 3070

Quick Deploy

Docker Image:

Ports:

Command:

Accessing Your Service

After deployment, find your http_pub URL in My Orders:

  1. Go to My Orders page

  2. Click on your order

  3. Find the http_pub URL (e.g., abc123.clorecloud.net)

Use https://YOUR_HTTP_PUB_URL instead of localhost in examples below.

Installation

Voice Conversion (Inference)

Using Web UI

  1. Open http://<proxy>:7865

  2. Go to "Model Inference" tab

  3. Upload audio file

  4. Select voice model

  5. Adjust settings

  6. Click "Convert"

Python API

Training Custom Voice

Prepare Dataset

  1. Collect 10-30 minutes of clean audio

  2. Cut into 5-15 second clips

  3. Remove background noise/music

Train via Web UI

  1. Go to "Train" tab

  2. Enter experiment name

  3. Set training folder path

  4. Click "Process data"

  5. Click "Feature extraction"

  6. Click "Train"

Train via Command Line

Training Parameters

Parameter
Description
Recommended

Sample Rate

Audio quality

48000

Batch Size

Training batch

8-16

Epochs

Training iterations

200-500

Save Every

Checkpoint frequency

20-50

f0 Method

Pitch extraction

rmvpe

F0 Methods

Method
Quality
Speed
Best For

pm

OK

Fast

Testing

harvest

Good

Slow

General

crepe

Great

Medium

Singing

rmvpe

Best

Medium

All

Real-Time Conversion

Setup

Model Formats

Convert to ONNX

Audio Preprocessing

Remove Noise

Normalize Volume

Remove Silence

Batch Processing

Singing Voice Conversion

For songs, use appropriate settings:

Common Issues

Voice Sounds Robotic

  • Use higher quality source audio

  • Increase protect value (0.4-0.5)

  • Try different f0 method

Pitch Issues

  • Adjust f0_up_key

  • Use rmvpe f0 method

  • Ensure consistent pitch in training data

Audio Quality

  • Use 48kHz sample rate

  • Remove background noise from training data

  • Train for more epochs

API Server

Training Tips

For Better Quality

  • Use 20+ minutes of clean audio

  • Remove all background noise

  • Consistent microphone/recording setup

  • Include varied expressions/emotions

For Faster Training

  • Use 8-16 batch size

  • Enable mixed precision

  • Use NVMe SSD for dataset

Performance

Task
GPU
Time

Inference (1 min audio)

RTX 3090

~5s

Training (30 min dataset)

RTX 3090

~2 hours

Real-time conversion

RTX 3070

<50ms latency

Troubleshooting

Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

GPU
Hourly Rate
Daily Rate
4-Hour Session

RTX 3060

~$0.03

~$0.70

~$0.12

RTX 3090

~$0.06

~$1.50

~$0.25

RTX 4090

~$0.10

~$2.30

~$0.40

A100 40GB

~$0.17

~$4.00

~$0.70

A100 80GB

~$0.25

~$6.00

~$1.00

Prices vary by provider and demand. Check CLORE.AI Marketplacearrow-up-right for current rates.

Save money:

  • Use Spot market for flexible workloads (often 30-50% cheaper)

  • Pay with CLORE tokens

  • Compare prices across different providers

Next Steps

Last updated

Was this helpful?