# Demucs Separation

Separate music into stems (vocals, drums, bass, other) with Demucs.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is Demucs?

Demucs by Meta AI can:

* Separate vocals from music
* Extract drums, bass, and other instruments
* Process any audio format
* High-quality stem extraction

## Quick Deploy

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install demucs gradio && \
python -c "
import gradio as gr
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torch
import torchaudio
import tempfile
import os

model = get_model('htdemucs')
model.cuda()

def separate(audio_path, stem):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = {'drums': 0, 'bass': 1, 'other': 2, 'vocals': 3}
    output = sources[stems[stem]].cpu()

    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        torchaudio.save(f.name, output, sr)
        return f.name

demo = gr.Interface(
    fn=separate,
    inputs=[gr.Audio(type='filepath'), gr.Dropdown(['vocals', 'drums', 'bass', 'other'])],
    outputs=gr.Audio(),
    title='Demucs Audio Separator'
)
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
pip install demucs

# or
pip install -e git+https://github.com/facebookresearch/demucs#egg=demucs
```

## Command Line Usage

### Basic Separation

```bash

# Separate into 4 stems
demucs song.mp3

# Output: separated/htdemucs/song/{drums,bass,other,vocals}.wav
```

### Options

```bash
demucs \
    --two-stems vocals \     # Only vocals + instrumental
    -n htdemucs \            # Model name
    -d cuda \                # Use GPU
    -o ./output \            # Output directory
    --mp3 \                  # Output as MP3
    song.mp3
```

### Process Folder

```bash
demucs --two-stems vocals -d cuda ./songs/*.mp3
```

## Python API

### Basic Separation

```python
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch

# Load model
model = get_model('htdemucs')
model.cuda()
model.eval()

# Load audio
wav, sr = torchaudio.load("song.mp3")
wav = wav.cuda()

# Separate
with torch.no_grad():
    sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

# sources shape: [4, channels, samples]

# 0: drums, 1: bass, 2: other, 3: vocals

# Save stems
stems = ['drums', 'bass', 'other', 'vocals']
for i, stem in enumerate(stems):
    torchaudio.save(f"{stem}.wav", sources[i].cpu(), sr)
```

### Get Only Vocals

```python
def extract_vocals(audio_path):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    vocals = sources[3].cpu()  # Index 3 = vocals
    return vocals, sr

vocals, sr = extract_vocals("song.mp3")
torchaudio.save("vocals.wav", vocals, sr)
```

### Get Instrumental (No Vocals)

```python
def extract_instrumental(audio_path):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Sum drums + bass + other
    instrumental = sources[0] + sources[1] + sources[2]
    return instrumental.cpu(), sr

instrumental, sr = extract_instrumental("song.mp3")
torchaudio.save("instrumental.wav", instrumental, sr)
```

## Model Variants

| Model        | Stems | Quality | Speed  |
| ------------ | ----- | ------- | ------ |
| htdemucs     | 4     | Best    | Medium |
| htdemucs\_ft | 4     | Best+   | Slow   |
| htdemucs\_6s | 6     | Great   | Medium |
| mdx\_extra   | 4     | Great   | Fast   |

### 6-Stem Model

```python
model = get_model('htdemucs_6s')

# Stems: drums, bass, other, vocals, guitar, piano
```

### Fine-tuned Model

```python
model = get_model('htdemucs_ft')

# Higher quality but slower
```

## Batch Processing

```python
import os
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch

model = get_model('htdemucs')
model.cuda()
model.eval()

input_dir = "./songs"
output_dir = "./separated"

for filename in os.listdir(input_dir):
    if filename.endswith(('.mp3', '.wav', '.flac')):
        input_path = os.path.join(input_dir, filename)
        song_output_dir = os.path.join(output_dir, filename.rsplit('.', 1)[0])
        os.makedirs(song_output_dir, exist_ok=True)

        print(f"Processing: {filename}")

        wav, sr = torchaudio.load(input_path)
        wav = wav.cuda()

        with torch.no_grad():
            sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

        stems = ['drums', 'bass', 'other', 'vocals']
        for i, stem in enumerate(stems):
            torchaudio.save(
                os.path.join(song_output_dir, f"{stem}.wav"),
                sources[i].cpu(),
                sr
            )

        print(f"Saved: {song_output_dir}")
```

## API Server

```python
from fastapi import FastAPI, UploadFile
from fastapi.responses import FileResponse
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch
import tempfile
import os

app = FastAPI()

model = get_model('htdemucs')
model.cuda()
model.eval()

@app.post("/separate")
async def separate(file: UploadFile, stem: str = "vocals"):
    # Save uploaded file
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    # Load and separate
    wav, sr = torchaudio.load(tmp_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = {'drums': 0, 'bass': 1, 'other': 2, 'vocals': 3}
    output = sources[stems[stem]].cpu()

    # Save output
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as out:
        torchaudio.save(out.name, output, sr)
        return FileResponse(out.name, media_type="audio/wav")

@app.post("/instrumental")
async def get_instrumental(file: UploadFile):
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    wav, sr = torchaudio.load(tmp_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Combine non-vocal stems
    instrumental = sources[0] + sources[1] + sources[2]

    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as out:
        torchaudio.save(out.name, instrumental.cpu(), sr)
        return FileResponse(out.name, media_type="audio/wav")

# Run: uvicorn server:app --host 0.0.0.0 --port 8000
```

## Memory Optimization

### For Long Audio

```python
from demucs.apply import apply_model

# Use splitting for long audio
sources = apply_model(
    model,
    wav.unsqueeze(0),
    split=True,         # Split into chunks
    overlap=0.25,       # Overlap between chunks
    progress=True
)[0]
```

### For Limited VRAM

```python

# Use CPU for some operations
model.cpu()
wav = wav.cpu()

# Or use segment processing
sources = apply_model(
    model,
    wav.unsqueeze(0),
    split=True,
    segment=10  # 10 second segments
)[0]
```

## Use Cases

### Karaoke Track

```python
def create_karaoke(song_path):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # Everything except vocals
    karaoke = sources[0] + sources[1] + sources[2]
    return karaoke.cpu(), sr
```

### Remix Preparation

```python
def extract_all_stems(song_path, output_dir):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = ['drums', 'bass', 'other', 'vocals']
    paths = {}

    for i, stem in enumerate(stems):
        path = os.path.join(output_dir, f"{stem}.wav")
        torchaudio.save(path, sources[i].cpu(), sr)
        paths[stem] = path

    return paths
```

### Acapella Extraction

```python
def extract_acapella(song_path):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    vocals = sources[3]
    return vocals.cpu(), sr
```

## Quality Tips

### For Best Results

* Use lossless input (WAV, FLAC)
* Higher sample rate = better quality
* Use `htdemucs_ft` for critical work

### Post-Processing

```python
from pydub import AudioSegment
from pydub.effects import normalize, high_pass_filter

# Load separated vocal
vocals = AudioSegment.from_wav("vocals.wav")

# Remove low rumble
vocals = high_pass_filter(vocals, 80)

# Normalize
vocals = normalize(vocals)

vocals.export("vocals_clean.wav", format="wav")
```

## Performance

| Audio Length | GPU      | Time    |
| ------------ | -------- | ------- |
| 3 min song   | RTX 3090 | \~15s   |
| 3 min song   | RTX 4090 | \~10s   |
| 3 min song   | A100     | \~8s    |
| 1 hour album | RTX 3090 | \~5 min |

## Troubleshooting

### Out of Memory

```bash

# Use smaller segments
demucs --segment 10 song.mp3
```

### Poor Separation

* Use htdemucs\_ft model
* Check input quality
* Avoid heavily compressed MP3s

### Artifacts

* Increase overlap
* Use higher quality model
* Check for clipping in input

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* [RVC Voice Clone](https://docs.clore.ai/guides/audio-and-voice/rvc-voice-clone) - Process extracted vocals
* [AudioCraft Music](https://docs.clore.ai/guides/audio-and-voice/audiocraft-music) - Generate new music
* [Whisper Transcription](https://docs.clore.ai/guides/audio-and-voice/whisper-transcription) - Transcribe vocals
