# Demucs Separation

Separate music into stems (vocals, drums, bass, other) with Demucs.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is Demucs?

Demucs by Meta AI can:

* Separate vocals from music
* Extract drums, bass, and other instruments
* Process any audio format
* High-quality stem extraction

## Quick Deploy

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install demucs gradio && \
python -c "
import gradio as gr
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torch
import torchaudio
import tempfile
import os

model = get_model('htdemucs')
model.cuda()

def separate(audio_path, stem):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = {'drums': 0, 'bass': 1, 'other': 2, 'vocals': 3}
    output = sources[stems[stem]].cpu()

    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        torchaudio.save(f.name, output, sr)
        return f.name

demo = gr.Interface(
    fn=separate,
    inputs=[gr.Audio(type='filepath'), gr.Dropdown(['vocals', 'drums', 'bass', 'other'])],
    outputs=gr.Audio(),
    title='Demucs Audio Separator'
)
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
pip install demucs

# or
pip install -e git+https://github.com/facebookresearch/demucs#egg=demucs
```

## Command Line Usage

### Basic Separation

```bash

# Separate into 4 stems
demucs song.mp3

# Output: separated/htdemucs/song/{drums,bass,other,vocals}.wav
```

### Options

```bash
demucs \
    --two-stems vocals \     # Only vocals + instrumental
    -n htdemucs \            # Model name
    -d cuda \                # Use GPU
    -o ./output \            # Output directory
    --mp3 \                  # Output as MP3
    song.mp3
```

### Process Folder

```bash
demucs --two-stems vocals -d cuda ./songs/*.mp3
```

## Python API

### Basic Separation

```python
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch

# Load model
model = get_model('htdemucs')
model.cuda()
model.eval()

# Load audio
wav, sr = torchaudio.load("song.mp3")
wav = wav.cuda()

# Separate
with torch.no_grad():
    sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

# sources shape: [4, channels, samples]

# 0: drums, 1: bass, 2: other, 3: vocals

# Save stems
stems = ['drums', 'bass', 'other', 'vocals']
for i, stem in enumerate(stems):
    torchaudio.save(f"{stem}.wav", sources[i].cpu(), sr)
```

### Get Only Vocals

```python
def extract_vocals(audio_path):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    vocals = sources[3].cpu()  # Index 3 = vocals
    return vocals, sr

vocals, sr = extract_vocals("song.mp3")
torchaudio.save("vocals.wav", vocals, sr)
```

### Get Instrumental (No Vocals)

```python
def extract_instrumental(audio_path):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Sum drums + bass + other
    instrumental = sources[0] + sources[1] + sources[2]
    return instrumental.cpu(), sr

instrumental, sr = extract_instrumental("song.mp3")
torchaudio.save("instrumental.wav", instrumental, sr)
```

## Model Variants

| Model        | Stems | Quality | Speed  |
| ------------ | ----- | ------- | ------ |
| htdemucs     | 4     | Best    | Medium |
| htdemucs\_ft | 4     | Best+   | Slow   |
| htdemucs\_6s | 6     | Great   | Medium |
| mdx\_extra   | 4     | Great   | Fast   |

### 6-Stem Model

```python
model = get_model('htdemucs_6s')

# Stems: drums, bass, other, vocals, guitar, piano
```

### Fine-tuned Model

```python
model = get_model('htdemucs_ft')

# Higher quality but slower
```

## Batch Processing

```python
import os
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch

model = get_model('htdemucs')
model.cuda()
model.eval()

input_dir = "./songs"
output_dir = "./separated"

for filename in os.listdir(input_dir):
    if filename.endswith(('.mp3', '.wav', '.flac')):
        input_path = os.path.join(input_dir, filename)
        song_output_dir = os.path.join(output_dir, filename.rsplit('.', 1)[0])
        os.makedirs(song_output_dir, exist_ok=True)

        print(f"Processing: {filename}")

        wav, sr = torchaudio.load(input_path)
        wav = wav.cuda()

        with torch.no_grad():
            sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

        stems = ['drums', 'bass', 'other', 'vocals']
        for i, stem in enumerate(stems):
            torchaudio.save(
                os.path.join(song_output_dir, f"{stem}.wav"),
                sources[i].cpu(),
                sr
            )

        print(f"Saved: {song_output_dir}")
```

## API Server

```python
from fastapi import FastAPI, UploadFile
from fastapi.responses import FileResponse
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch
import tempfile
import os

app = FastAPI()

model = get_model('htdemucs')
model.cuda()
model.eval()

@app.post("/separate")
async def separate(file: UploadFile, stem: str = "vocals"):
    # Save uploaded file
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    # Load and separate
    wav, sr = torchaudio.load(tmp_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = {'drums': 0, 'bass': 1, 'other': 2, 'vocals': 3}
    output = sources[stems[stem]].cpu()

    # Save output
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as out:
        torchaudio.save(out.name, output, sr)
        return FileResponse(out.name, media_type="audio/wav")

@app.post("/instrumental")
async def get_instrumental(file: UploadFile):
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    wav, sr = torchaudio.load(tmp_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Combine non-vocal stems
    instrumental = sources[0] + sources[1] + sources[2]

    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as out:
        torchaudio.save(out.name, instrumental.cpu(), sr)
        return FileResponse(out.name, media_type="audio/wav")

# Run: uvicorn server:app --host 0.0.0.0 --port 8000
```

## Memory Optimization

### For Long Audio

```python
from demucs.apply import apply_model

# Use splitting for long audio
sources = apply_model(
    model,
    wav.unsqueeze(0),
    split=True,         # Split into chunks
    overlap=0.25,       # Overlap between chunks
    progress=True
)[0]
```

### For Limited VRAM

```python

# Use CPU for some operations
model.cpu()
wav = wav.cpu()

# Or use segment processing
sources = apply_model(
    model,
    wav.unsqueeze(0),
    split=True,
    segment=10  # 10 second segments
)[0]
```

## Use Cases

### Karaoke Track

```python
def create_karaoke(song_path):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Everything except vocals
    karaoke = sources[0] + sources[1] + sources[2]
    return karaoke.cpu(), sr
```

### Remix Preparation

```python
def extract_all_stems(song_path, output_dir):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = ['drums', 'bass', 'other', 'vocals']
    paths = {}

    for i, stem in enumerate(stems):
        path = os.path.join(output_dir, f"{stem}.wav")
        torchaudio.save(path, sources[i].cpu(), sr)
        paths[stem] = path

    return paths
```

### Acapella Extraction

```python
def extract_acapella(song_path):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    vocals = sources[3]
    return vocals.cpu(), sr
```

## Quality Tips

### For Best Results

* Use lossless input (WAV, FLAC)
* Higher sample rate = better quality
* Use `htdemucs_ft` for critical work

### Post-Processing

```python
from pydub import AudioSegment
from pydub.effects import normalize, high_pass_filter

# Load separated vocal
vocals = AudioSegment.from_wav("vocals.wav")

# Remove low rumble
vocals = high_pass_filter(vocals, 80)

# Normalize
vocals = normalize(vocals)

vocals.export("vocals_clean.wav", format="wav")
```

## Performance

| Audio Length | GPU      | Time    |
| ------------ | -------- | ------- |
| 3 min song   | RTX 3090 | \~15s   |
| 3 min song   | RTX 4090 | \~10s   |
| 3 min song   | A100     | \~8s    |
| 1 hour album | RTX 3090 | \~5 min |

## Troubleshooting

### Out of Memory

```bash

# Use smaller segments
demucs --segment 10 song.mp3
```

### Poor Separation

* Use htdemucs\_ft model
* Check input quality
* Avoid heavily compressed MP3s

### Artifacts

* Increase overlap
* Use higher quality model
* Check for clipping in input

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* [RVC Voice Clone](https://docs.clore.ai/guides/audio-and-voice/rvc-voice-clone) - Process extracted vocals
* [AudioCraft Music](https://docs.clore.ai/guides/audio-and-voice/audiocraft-music) - Generate new music
* [Whisper Transcription](https://docs.clore.ai/guides/audio-and-voice/whisper-transcription) - Transcribe vocals


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/audio-and-voice/demucs-separation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
