> For the complete documentation index, see [llms.txt](https://docs.clore.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.clore.ai/guides/guides_v2-de/audio-and-sprache/demucs-separation.md).

# Demucs Trennung

Teile Musik mit Demucs in Stems auf (Gesang, Schlagzeug, Bass, sonstige).

{% hint style="success" %}
Alle Beispiele können auf GPU-Servern ausgeführt werden, die über [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Mieten auf CLORE.AI

1. Besuchen Sie [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Nach GPU-Typ, VRAM und Preis filtern
3. Wählen **On-Demand** (Festpreis) oder **Spot** (Gebotspreis)
4. Konfigurieren Sie Ihre Bestellung:
   * Docker-Image auswählen
   * Ports festlegen (TCP für SSH, HTTP für Web-UIs)
   * Umgebungsvariablen bei Bedarf hinzufügen
   * Startbefehl eingeben
5. Zahlung auswählen: **CLORE**, **BTC**, oder **USDT/USDC**
6. Bestellung erstellen und auf Bereitstellung warten

### Zugriff auf Ihren Server

* Verbindungsdetails finden Sie in **Meine Bestellungen**
* Webschnittstellen: Verwenden Sie die HTTP-Port-URL
* SSH: `ssh -p <port> root@<proxy-address>`

## Was ist Demucs?

Demucs von Meta AI kann:

* Gesang von Musik trennen
* Schlagzeug, Bass und andere Instrumente extrahieren
* Jedes Audi Format verarbeiten
* Hochwertige Stem-Extraktion

## Schnelle Bereitstellung

**Docker-Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime
```

**Ports:**

```
22/tcp
7860/http
```

**Befehl:**

```bash
pip install demucs gradio && \
python -c "
import gradio as gr
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torch
import torchaudio
import tempfile
import os

model = get_model('htdemucs')
model.cuda()

def separate(audio_path, stem):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = {'drums': 0, 'bass': 1, 'other': 2, 'vocals': 3}
    output = sources[stems[stem]].cpu()

    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        torchaudio.save(f.name, output, sr)
        return f.name

demo = gr.Interface(
    fn=separate,
    inputs=[gr.Audio(type='filepath'), gr.Dropdown(['vocals', 'drums', 'bass', 'other'])],
    outputs=gr.Audio(),
    title='Demucs Audio Separator'
)
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Zugriff auf Ihren Dienst

Nach der Bereitstellung finden Sie Ihre `http_pub` URL in **Meine Bestellungen**:

1. Gehen Sie zur **Meine Bestellungen** Seite
2. Klicken Sie auf Ihre Bestellung
3. Finden Sie die `http_pub` URL (z. B., `abc123.clorecloud.net`)

Verwenden Sie `https://IHRE_HTTP_PUB_URL` anstelle von `localhost` in den Beispielen unten.

## Installation

```bash
pip install demucs

# oder
pip install -e git+https://github.com/facebookresearch/demucs#egg=demucs
```

## Kommandozeilenverwendung

### Grundlegende Trennung

```bash

# In 4 Stems trennen
demucs song.mp3

# Ausgabe: separated/htdemucs/song/{drums,bass,other,vocals}.wav
```

### Optionen

```bash
demucs \
    --two-stems vocals \     # Nur Gesang + Instrumental
    -n htdemucs \            # Modellname
    -d cuda \                # GPU verwenden
    -o ./output \            # Ausgabeverzeichnis
    --mp3 \                  # Als MP3 ausgeben
    song.mp3
```

### Ordner verarbeiten

```bash
demucs --two-stems vocals -d cuda ./songs/*.mp3
```

## Python-API

### Grundlegende Trennung

```python
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch

# Modell laden
model = get_model('htdemucs')
model.cuda()
model.eval()

# Audio laden
wav, sr = torchaudio.load("song.mp3")
wav = wav.cuda()

# Trennen
with torch.no_grad():
    sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

# sources Form: [4, Kanäle, Samples]

# 0: drums, 1: bass, 2: other, 3: vocals

# Stems speichern
stems = ['drums', 'bass', 'other', 'vocals']
for i, stem in enumerate(stems):
    torchaudio.save(f"{stem}.wav", sources[i].cpu(), sr)
```

### Nur Gesang erhalten

```python
def extract_vocals(audio_path):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    vocals = sources[3].cpu()  # Index 3 = Gesang
    return vocals, sr

vocals, sr = extract_vocals("song.mp3")
torchaudio.save("vocals.wav", vocals, sr)
```

### Instrumental erhalten (ohne Gesang)

```python
def extract_instrumental(audio_path):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Schlagzeug + Bass + sonstiges summieren
    instrumental = sources[0] + sources[1] + sources[2]
    return instrumental.cpu(), sr

instrumental, sr = extract_instrumental("song.mp3")
torchaudio.save("instrumental.wav", instrumental, sr)
```

## Modellvarianten

| Modell       | Stems | Qualität  | Geschwindigkeit |
| ------------ | ----- | --------- | --------------- |
| htdemucs     | 4     | Am besten | Mittel          |
| htdemucs\_ft | 4     | Best+     | Langsam         |
| htdemucs\_6s | 6     | Großartig | Mittel          |
| mdx\_extra   | 4     | Großartig | Schnell         |

### 6-Stem-Modell

```python
model = get_model('htdemucs_6s')

# Stems: drums, bass, other, vocals, guitar, piano
```

### Feinabgestimmtes Modell

```python
model = get_model('htdemucs_ft')

# Höhere Qualität, aber langsamer
```

## Batch-Verarbeitung

```python
import os
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch

model = get_model('htdemucs')
model.cuda()
model.eval()

input_dir = "./songs"
output_dir = "./separated"

for filename in os.listdir(input_dir):
    if filename.endswith(('.mp3', '.wav', '.flac')):
        input_path = os.path.join(input_dir, filename)
        song_output_dir = os.path.join(output_dir, filename.rsplit('.', 1)[0])
        os.makedirs(song_output_dir, exist_ok=True)

        print(f"Verarbeite: {filename}")

        wav, sr = torchaudio.load(input_path)
        wav = wav.cuda()

        with torch.no_grad():
            sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

        stems = ['drums', 'bass', 'other', 'vocals']
        for i, stem in enumerate(stems):
            torchaudio.save(
                os.path.join(song_output_dir, f"{stem}.wav"),
                sources[i].cpu(),
                sr
            )

        print(f"Gespeichert: {song_output_dir}")
```

## API-Server

```python
from fastapi import FastAPI, UploadFile
from fastapi.responses import FileResponse
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch
import tempfile
import os

app = FastAPI()

model = get_model('htdemucs')
model.cuda()
model.eval()

@app.post("/separate")
async def separate(file: UploadFile, stem: str = "vocals"):
    # Hochgeladene Datei speichern
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    # Laden und trennen
    wav, sr = torchaudio.load(tmp_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = {'drums': 0, 'bass': 1, 'other': 2, 'vocals': 3}
    output = sources[stems[stem]].cpu()

    # Ausgabe speichern
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as out:
        torchaudio.save(out.name, output, sr)
        return FileResponse(out.name, media_type="audio/wav")

@app.post("/instrumental")
async def get_instrumental(file: UploadFile):
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    wav, sr = torchaudio.load(tmp_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Nicht-gesangliche Stems kombinieren
    instrumental = sources[0] + sources[1] + sources[2]

    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as out:
        torchaudio.save(out.name, instrumental.cpu(), sr)
        return FileResponse(out.name, media_type="audio/wav")

# Ausführen: uvicorn server:app --host 0.0.0.0 --port 8000
```

## Speicheroptimierung

### Für lange Audiodateien

```python
from demucs.apply import apply_model

# Splitting für lange Audiodateien verwenden
sources = apply_model(
    model,
    wav.unsqueeze(0),
    split=True,         # In Abschnitte aufteilen
    overlap=0.25,       # Überlappung zwischen Abschnitten
    progress=True
)[0]
```

### Für begrenzten VRAM

```python

# Für einige Operationen CPU verwenden
model.cpu()
wav = wav.cpu()

# Oder Segmentverarbeitung verwenden
sources = apply_model(
    model,
    wav.unsqueeze(0),
    split=True,
    segment=10  # 10 Sekunden Segmente
)[0]
```

## Anwendungsfälle

### Karaoke-Track

```python
def create_karaoke(song_path):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Alles außer Gesang
    karaoke = sources[0] + sources[1] + sources[2]
    return karaoke.cpu(), sr
```

### Remix-Vorbereitung

```python
def extract_all_stems(song_path, output_dir):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = ['drums', 'bass', 'other', 'vocals']
    paths = {}

    for i, stem in enumerate(stems):
        path = os.path.join(output_dir, f"{stem}.wav")
        torchaudio.save(path, sources[i].cpu(), sr)
        paths[stem] = path

    return paths
```

### Acapella-Extraktion

```python
def extract_acapella(song_path):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    vocals = sources[3]
    return vocals.cpu(), sr
```

## Qualitätstipps

### Für beste Ergebnisse

* Verlustfreies Eingangsformat verwenden (WAV, FLAC)
* Höhere Abtastrate = bessere Qualität
* Verwenden Sie `htdemucs_ft` für kritische Arbeiten

### Nachbearbeitung

```python
from pydub import AudioSegment
from pydub.effects import normalize, high_pass_filter

# Separierten Gesang laden
vocals = AudioSegment.from_wav("vocals.wav")

# Niedrige Brummfrequenzen entfernen
vocals = high_pass_filter(vocals, 80)

# Normalisieren
vocals = normalize(vocals)

vocals.export("vocals_clean.wav", format="wav")
```

## Leistung

| Audiolänge     | GPU      | Zeit    |
| -------------- | -------- | ------- |
| 3 min Song     | RTX 3090 | \~15s   |
| 3 min Song     | RTX 4090 | \~10s   |
| 3 min Song     | A100     | \~8s    |
| 1 Stunde Album | RTX 3090 | \~5 Min |

## Fehlerbehebung

### Kein Speicher mehr

```bash

# Kleinere Segmente verwenden
demucs --segment 10 song.mp3
```

### Schlechte Trennung

* Verwende das htdemucs\_ft-Modell
* Eingangsqualität prüfen
* Stark komprimierte MP3s vermeiden

### Artefakte

* Überlappung erhöhen
* Höherwertiges Modell verwenden
* Auf Clipping im Eingang prüfen

## Kostenabschätzung

Typische CLORE.AI-Marktplatztarife (Stand 2024):

| GPU       | Stundensatz | Tagessatz | 4-Stunden-Sitzung |
| --------- | ----------- | --------- | ----------------- |
| RTX 3060  | \~$0.03     | \~$0.70   | \~$0.12           |
| RTX 3090  | \~$0.06     | \~$1.50   | \~$0.25           |
| RTX 4090  | \~$0.10     | \~$2.30   | \~$0.40           |
| A100 40GB | \~$0.17     | \~$4.00   | \~$0.70           |
| A100 80GB | \~$0.25     | \~$6.00   | \~$1.00           |

*Preise variieren je nach Anbieter und Nachfrage. Prüfen Sie* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *auf aktuelle Preise.*

**Geld sparen:**

* Verwenden Sie **Spot** Markt für flexible Workloads (oft 30–50% günstiger)
* Bezahlen mit **CLORE** Token
* Preise bei verschiedenen Anbietern vergleichen

## Nächste Schritte

* [RVC-Stimmenklon](/guides/guides_v2-de/audio-and-sprache/rvc-voice-clone.md) - Extrahierten Gesang verarbeiten
* [AudioCraft Music](/guides/guides_v2-de/audio-and-sprache/audiocraft-music.md) - Neue Musik erzeugen
* [Whisper Transcription](/guides/guides_v2-de/audio-and-sprache/whisper-transcription.md) - Gesang transkribieren


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-de/audio-and-sprache/demucs-separation.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.