> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-fr/audio-et-voix/openvoice-clone.md).

# OpenVoice

Clonez n'importe quelle voix avec seulement quelques secondes d'audio en utilisant OpenVoice.

{% hint style="success" %}
Tous les exemples peuvent être exécutés sur des serveurs GPU loués via [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Location sur CLORE.AI

1. Visitez [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filtrer par type de GPU, VRAM et prix
3. Choisir **À la demande** (tarif fixe) ou **Spot** (prix d'enchère)
4. Configurez votre commande :
   * Sélectionnez l'image Docker
   * Définissez les ports (TCP pour SSH, HTTP pour les interfaces web)
   * Ajoutez des variables d'environnement si nécessaire
   * Entrez la commande de démarrage
5. Sélectionnez le paiement : **CLORE**, **BTC**, ou **USDT/USDC**
6. Créez la commande et attendez le déploiement

### Accédez à votre serveur

* Trouvez les détails de connexion dans **Mes commandes**
* Interfaces Web : utilisez l'URL du port HTTP
* SSH : `ssh -p <port> root@<adresse-proxy>`

## Qu'est-ce qu'OpenVoice ?

OpenVoice par MyShell peut :

* Cloner des voix à partir d'environ 10 secondes d'audio
* Contrôler l'émotion, l'accent, le rythme
* Clonage vocal interlingue
* Conversion vocale zero-shot

## Exigences

| Tâche               | VRAM min | Recommandé |
| ------------------- | -------- | ---------- |
| Inférence           | 4 Go     | RTX 3060   |
| Traitement par lots | 6 Go     | RTX 3070   |

## Déploiement rapide

**Image Docker :**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime
```

**Ports :**

```
22/tcp
7860/http
```

**Commande :**

```bash
pip install git+https://github.com/myshell-ai/OpenVoice.git gradio && \
python -c "
import gradio as gr
from openvoice import se_extractor
from openvoice.api import ToneColorConverter
import torch

ckpt_converter = 'checkpoints_v2/converter'
device = 'cuda'
tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

def clone(source_audio, reference_audio):
    source_se, _ = se_extractor.get_se(source_audio, tone_color_converter, vad=False)
    target_se, _ = se_extractor.get_se(reference_audio, tone_color_converter, vad=False)

    output_path = 'output.wav'
    tone_color_converter.convert(
        audio_src_path=source_audio,
        src_se=source_se,
        tgt_se=target_se,
        output_path=output_path
    )
    return output_path

demo = gr.Interface(
    fn=clone,
    inputs=[gr.Audio(type='filepath', label='Source'), gr.Audio(type='filepath', label='Voix cible')],
    outputs=gr.Audio(label='Cloné'),
    title='OpenVoice Clone'
)
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accéder à votre service

Après le déploiement, trouvez votre `http_pub` URL dans **Mes commandes**:

1. Aller à la **Mes commandes** page
2. Cliquez sur votre commande
3. Trouvez l' `http_pub` URL (par ex., `abc123.clorecloud.net`)

Utilisez `https://VOTRE_HTTP_PUB_URL` au lieu de `localhost` dans les exemples ci-dessous.

## Installation

```bash
git clone https://github.com/myshell-ai/OpenVoice.git
cd OpenVoice
pip install -e .

# Télécharger les checkpoints
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='myshell-ai/OpenVoice', local_dir='checkpoints')"
```

## Clonage vocal basique

```python
from openvoice import se_extractor
from openvoice.api import ToneColorConverter
import torch

# Initialiser
device = "cuda" if torch.cuda.is_available() else "cpu"
ckpt_converter = 'checkpoints_v2/converter'

tone_color_converter = ToneColorConverter(
    f'{ckpt_converter}/config.json',
    device=device
)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

# Extraire les embeddings du locuteur
source_se, _ = se_extractor.get_se("source_audio.wav", tone_color_converter, vad=False)
target_se, _ = se_extractor.get_se("target_voice.wav", tone_color_converter, vad=False)

# Convertir la voix
tone_color_converter.convert(
    audio_src_path="source_audio.wav",
    src_se=source_se,
    tgt_se=target_se,
    output_path="output.wav"
)
```

## Avec synthèse vocale (Text-to-Speech)

Générer la parole dans n'importe quelle voix :

```python
from openvoice import se_extractor
from openvoice.api import ToneColorConverter, BaseSpeakerTTS
from melo.api import TTS

# Initialiser le TTS
tts = TTS(language='EN', device=device)
speaker_ids = tts.hps.data.spk2id

# Générer la parole de base
tts.tts_to_file("Hello, this is a test.", speaker_ids['EN-US'], "base.wav")

# Cloner vers la voix cible
source_se, _ = se_extractor.get_se("base.wav", tone_color_converter, vad=False)
target_se, _ = se_extractor.get_se("target_voice.wav", tone_color_converter, vad=False)

tone_color_converter.convert(
    audio_src_path="base.wav",
    src_se=source_se,
    tgt_se=target_se,
    output_path="cloned_speech.wav"
)
```

## Prise en charge multilingue

```python
from melo.api import TTS

# Langues disponibles
languages = ['EN', 'ES', 'FR', 'ZH', 'JP', 'KR']

# English
tts_en = TTS(language='EN', device=device)
tts_en.tts_to_file("Hello world", tts_en.hps.data.spk2id['EN-US'], "en.wav")

# Chinese
tts_zh = TTS(language='ZH', device=device)
tts_zh.tts_to_file("你好世界", tts_zh.hps.data.spk2id['ZH'], "zh.wav")

# Japonais
tts_jp = TTS(language='JP', device=device)
tts_jp.tts_to_file("こんにちは", tts_jp.hps.data.spk2id['JP'], "jp.wav")
```

## Contrôle de l'émotion

OpenVoice V2 prend en charge le contrôle des émotions/styles :

```python
from openvoice.api import BaseSpeakerTTS

# TTS de base avec styles
base_speaker_tts = BaseSpeakerTTS(
    f'{ckpt_base}/config.json',
    device=device
)
base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')

# Styles disponibles
styles = ['default', 'whispering', 'cheerful', 'terrified', 'angry', 'sad', 'friendly']

for style in styles:
    base_speaker_tts.tts(
        "This is a test sentence.",
        f"output_{style}.wav",
        speaker='default',
        language='English',
        style=style
    )
```

## Traitement par lots

```python
import os
from openvoice import se_extractor
from openvoice.api import ToneColorConverter

tone_color_converter = ToneColorConverter(
    f'{ckpt_converter}/config.json',
    device='cuda'
)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

# Obtenir l'embedding de la voix cible une fois
target_se, _ = se_extractor.get_se("target_voice.wav", tone_color_converter, vad=False)

input_dir = "./audio_files"
output_dir = "./cloned"
os.makedirs(output_dir, exist_ok=True)

for filename in os.listdir(input_dir):
    if filename.endswith(('.wav', '.mp3')):
        input_path = os.path.join(input_dir, filename)
        output_path = os.path.join(output_dir, f"cloned_{filename}")

        source_se, _ = se_extractor.get_se(input_path, tone_color_converter, vad=False)

        tone_color_converter.convert(
            audio_src_path=input_path,
            src_se=source_se,
            tgt_se=target_se,
            output_path=output_path
        )
        print(f"Cloned: {filename}")
```

## Serveur API

```python
from fastapi import FastAPI, UploadFile
from fastapi.responses import FileResponse
from openvoice import se_extractor
from openvoice.api import ToneColorConverter
import tempfile
import shutil

app = FastAPI()

tone_color_converter = ToneColorConverter(
    'checkpoints_v2/converter/config.json',
    device='cuda'
)
tone_color_converter.load_ckpt('checkpoints_v2/converter/checkpoint.pth')

@app.post("/clone")
async def clone_voice(source: UploadFile, target: UploadFile):
    # Enregistrer les fichiers téléchargés
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as src_tmp:
        shutil.copyfileobj(source.file, src_tmp)
        src_path = src_tmp.name

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tgt_tmp:
        shutil.copyfileobj(target.file, tgt_tmp)
        tgt_path = tgt_tmp.name

    # Extraire les embeddings
    source_se, _ = se_extractor.get_se(src_path, tone_color_converter, vad=False)
    target_se, _ = se_extractor.get_se(tgt_path, tone_color_converter, vad=False)

    # Convertir
    output_path = tempfile.mktemp(suffix=".wav")
    tone_color_converter.convert(
        audio_src_path=src_path,
        src_se=source_se,
        tgt_se=target_se,
        output_path=output_path
    )

    return FileResponse(output_path, media_type="audio/wav")

# Lancer : uvicorn server:app --host 0.0.0.0 --port 8000
```

## Conseils de qualité

### Pour de meilleurs résultats

* Utilisez 10 à 30 secondes d'audio de référence clair
* Évitez le bruit de fond
* Référence avec un seul locuteur uniquement
* Adaptez approximativement le rythme de parole

### Prétraitement audio

```python
import librosa
import soundfile as sf

def preprocess_audio(input_path, output_path, target_sr=22050):
    audio, sr = librosa.load(input_path, sr=target_sr)

    # Couper le silence
    audio, _ = librosa.effects.trim(audio, top_db=20)

    # Normaliser
    audio = librosa.util.normalize(audio)

    sf.write(output_path, audio, target_sr)
    return output_path

preprocess_audio("raw_reference.wav", "clean_reference.wav")
```

## Comparaison avec d'autres outils

| Fonction           | OpenVoice      | RVC      | Bark |
| ------------------ | -------------- | -------- | ---- |
| Audio de référence | 10-30s         | 10+ min  | N/A  |
| Entraînement       | Non nécessaire | Requis   | N/A  |
| Vitesse            | Rapide         | Moyen    | Lent |
| Qualité            | Excellent      | Meilleur | Bon  |
| Interlingue        | Oui            | Limité   | Oui  |

## Performances

| Tâche                   | GPU      | Temps |
| ----------------------- | -------- | ----- |
| Extraire l'embedding    | RTX 3090 | \~1s  |
| Convertir 10s d'audio   | RTX 3090 | \~2s  |
| Convertir 1 min d'audio | RTX 3090 | \~8s  |

## Dépannage

### Mauvaise correspondance de voix

* Utiliser un audio de référence plus long
* Assurer une qualité audio claire
* Vérifier la présence de bruit de fond

### Artefacts audio

* Réduire les réglages de vitesse/ou d'emphase
* Utiliser un format audio cohérent
* Vérifier la correspondance du taux d'échantillonnage

### Mémoire insuffisante

* Traiter des extraits plus courts
* Réduire la taille de batch
* Vider le cache CUDA

## Estimation des coûts

Tarifs typiques du marché CLORE.AI (à partir de 2024) :

| GPU       | Tarif horaire | Tarif journalier | Session de 4 heures |
| --------- | ------------- | ---------------- | ------------------- |
| RTX 3060  | \~$0.03       | \~$0.70          | \~$0.12             |
| RTX 3090  | \~$0.06       | \~$1.50          | \~$0.25             |
| RTX 4090  | \~$0.10       | \~$2.30          | \~$0.40             |
| A100 40GB | \~$0.17       | \~$4.00          | \~$0.70             |
| A100 80GB | \~$0.25       | \~$6.00          | \~$1.00             |

*Les prix varient selon le fournisseur et la demande. Vérifiez* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *pour les tarifs actuels.*

**Économisez de l'argent :**

* Utilisez **Spot** market pour les charges de travail flexibles (souvent 30-50 % moins cher)
* Payer avec **CLORE** jetons
* Comparer les prix entre différents fournisseurs

## Prochaines étapes

* [Bark TTS](/guides/guides_v2-fr/audio-et-voix/bark-tts.md) - Synthèse vocale
* [RVC Voice Clone](/guides/guides_v2-fr/audio-et-voix/rvc-voice-clone.md) - Clonage basé sur l'entraînement
* [Whisper Transcription](/guides/guides_v2-fr/audio-et-voix/whisper-transcription.md) - Reconnaissance vocale (speech-to-text)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-fr/audio-et-voix/openvoice-clone.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.