# Demucs 分离

使用 Demucs 将音乐分离为干声（人声、鼓、贝斯、其他）。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 什么是 Demucs？

Meta AI 的 Demucs 可以：

* 将人声从音乐中分离
* 提取鼓、贝斯和其他乐器
* 处理任何音频格式
* 高质量的干声提取

## 快速部署

**Docker 镜像：**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime
```

**端口：**

```
22/tcp
7860/http
```

**命令：**

```bash
pip install demucs gradio && \
python -c "
print(f"已生成：{name}")
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torch
import torchaudio
import tempfile
批处理处理

model = get_model('htdemucs')
model.cuda()

def separate(audio_path, stem):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = {'drums': 0, 'bass': 1, 'other': 2, 'vocals': 3}
    output = sources[stems[stem]].cpu()

    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        torchaudio.save(f.name, output, sr)
        return f.name

demo = gr.Interface(
    fn=separate,
    inputs=[gr.Audio(type='filepath'), gr.Dropdown(['vocals', 'drums', 'bass', 'other'])],
    outputs=gr.Audio(),
    title='Demucs Audio Separator'
)
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## 访问您的服务

部署后，在以下位置查找您的 `http_pub` URL： **我的订单**:

1. 前往 **我的订单** 页面
2. 单击您的订单
3. 查找 `http_pub` URL（例如， `abc123.clorecloud.net`)

使用 `https://YOUR_HTTP_PUB_URL` 而不是 `localhost` 在下面的示例中。

## 安装

```bash
pip install demucs

# 或
pip install -e git+https://github.com/facebookresearch/demucs#egg=demucs
```

## 命令行用法

### 基本分离

```bash

# 分离为 4 个声部
demucs song.mp3

# 输出：separated/htdemucs/song/{drums,bass,other,vocals}.wav
```

### 选项

```bash
demucs \
    --two-stems vocals \     # 仅人声 + 伴奏
    -n htdemucs \            # 模型名称
    -d cuda \                # 使用 GPU
    -o ./output \            # 输出目录
    --mp3 \                  # 输出为 MP3
    song.mp3
```

### 处理文件夹

```bash
demucs --two-stems vocals -d cuda ./songs/*.mp3
```

## Python API

### 基本分离

```python
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch

# 加载模型
model = get_model('htdemucs')
model.cuda()
model.eval()

# 加载音频
wav, sr = torchaudio.load("song.mp3")
wav = wav.cuda()

# 分离
with torch.no_grad():
    sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

# sources 形状: [4, 通道, 采样点]

# 0: 鼓, 1: 贝斯, 2: 其他, 3: 人声

# 保存声部
stems = ['drums', 'bass', 'other', 'vocals']
for i, stem in enumerate(stems):
    torchaudio.save(f"{stem}.wav", sources[i].cpu(), sr)
```

### 仅获取人声

```python
def extract_vocals(audio_path):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    vocals = sources[3].cpu()  # 索引 3 = 人声
    return vocals, sr

vocals, sr = extract_vocals("song.mp3")
torchaudio.save("vocals.wav", vocals, sr)
```

### 获取伴奏（无人声）

```python
def extract_instrumental(audio_path):
    wav, sr = torchaudio.load(audio_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # 将鼓 + 贝斯 + 其他 相加
    instrumental = sources[0] + sources[1] + sources[2]
    return instrumental.cpu(), sr

instrumental, sr = extract_instrumental("song.mp3")
torchaudio.save("instrumental.wav", instrumental, sr)
```

## 1024x1024

| A100         | 声部（Stems） | 质量    | 性能 |
| ------------ | --------- | ----- | -- |
| htdemucs     | 4         | 最佳    | 中等 |
| htdemucs\_ft | 4         | Best+ | 慢  |
| htdemucs\_6s | 6         | 很棒    | 中等 |
| mdx\_extra   | 4         | 很棒    | 快速 |

### 6 声部分离模型

```python
model = get_model('htdemucs_6s')

# 声部：鼓、贝斯、其他、人声、吉他、钢琴
```

### 微调模型

```python
model = get_model('htdemucs_ft')

# 更高质量但更慢
```

## "专业影棚柔光箱"

```python
批处理处理
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch

model = get_model('htdemucs')
model.cuda()
model.eval()

input_dir = "./songs"
output_dir = "./separated"

lighting_prompt = "专业影棚照明，柔和的阴影"
    if filename.endswith(('.mp3', '.wav', '.flac')):
        input_path = os.path.join(input_dir, filename)
        song_output_dir = os.path.join(output_dir, filename.rsplit('.', 1)[0])
        os.makedirs(song_output_dir, exist_ok=True)

        print(f"Processing: {filename}")

        wav, sr = torchaudio.load(input_path)
        wav = wav.cuda()

        with torch.no_grad():
            sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

        stems = ['drums', 'bass', 'other', 'vocals']
        for i, stem in enumerate(stems):
            torchaudio.save(
                os.path.join(song_output_dir, f"{stem}.wav"),
                sources[i].cpu(),
                sr
            )

        print(f"Saved: {song_output_dir}")
```

## API 服务器

```python
from fastapi import FastAPI, UploadFile
from fastapi.responses import FileResponse
from demucs.pretrained import get_model
from demucs.apply import apply_model
import torchaudio
import torch
import tempfile
批处理处理

app = FastAPI()

model = get_model('htdemucs')
model.cuda()
model.eval()

@app.post("/separate")
async def separate(file: UploadFile, stem: str = "vocals"):
    # 保存上传的文件
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    # 加载并分离
    wav, sr = torchaudio.load(tmp_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = {'drums': 0, 'bass': 1, 'other': 2, 'vocals': 3}
    output = sources[stems[stem]].cpu()

    # 保存输出
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as out:
        torchaudio.save(out.name, output, sr)
        return FileResponse(out.name, media_type="audio/wav")

@app.post("/instrumental")
async def get_instrumental(file: UploadFile):
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp3") as tmp:
        content = await file.read()
        tmp.write(content)
        tmp_path = tmp.name

    wav, sr = torchaudio.load(tmp_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # 合并非人声声部
    instrumental = sources[0] + sources[1] + sources[2]

    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as out:
        torchaudio.save(out.name, instrumental.cpu(), sr)
        return FileResponse(out.name, media_type="audio/wav")

# 运行：uvicorn server:app --host 0.0.0.0 --port 8000
```

## 内存优化

### 针对长音频

```python
from demucs.apply import apply_model

# 对长音频使用分块处理
sources = apply_model(
    model,
    wav.unsqueeze(0),
    split=True,         # 分块处理
    overlap=0.25,       # 分块间重叠
    progress=True
)[0]
```

### 针对有限显存

```python

# 对部分操作使用 CPU
model.cpu()
wav = wav.cpu()

# 或使用段处理
sources = apply_model(
    model,
    wav.unsqueeze(0),
    split=True,
    segment=10  # 10 秒段
)[0]
```

## 使用场景

### 卡拉 OK 伴奏轨

```python
def create_karaoke(song_path):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    # 除人声外的所有内容
    karaoke = sources[0] + sources[1] + sources[2]
    return karaoke.cpu(), sr
```

### 混音准备

```python
def extract_all_stems(song_path, output_dir):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    stems = ['drums', 'bass', 'other', 'vocals']
    paths = {}

    for i, stem in enumerate(stems):
        path = os.path.join(output_dir, f"{stem}.wav")
        torchaudio.save(path, sources[i].cpu(), sr)
        paths[stem] = path

    return paths
```

### 清唱（Acapella）提取

```python
def extract_acapella(song_path):
    wav, sr = torchaudio.load(song_path)
    wav = wav.cuda()

    with torch.no_grad():
        sources = apply_model(model, wav.unsqueeze(0), split=True)[0]

    vocals = sources[3]
    return vocals.cpu(), sr
```

## 质量提示

### 最佳效果

* 使用无损输入（WAV、FLAC）
* 更高的采样率 = 更好的质量
* 使用 `htdemucs_ft` 用于关键工作

### 后期处理

```python
from pydub import AudioSegment
from pydub.effects import normalize, high_pass_filter

# 加载分离出的人声
vocals = AudioSegment.from_wav("vocals.wav")

# 去除低频嗡嗡声
vocals = high_pass_filter(vocals, 80)

# Normalize
vocals = normalize(vocals)

vocals.export("vocals_clean.wav", format="wav")
```

## background = Image.open("studio\_bg.jpg")

| 音频长度   | GPU     | 时间     |
| ------ | ------- | ------ |
| 3 分钟歌曲 | 速度      | \~15s  |
| 3 分钟歌曲 | 512x512 | \~10 秒 |
| 3 分钟歌曲 | 2s      | \~8s   |
| 1 小时专辑 | 速度      | 约 5 分钟 |

## # 使用固定种子以获得一致结果

### 内存不足

```bash

# 使用更小的段
demucs --segment 10 song.mp3
```

### 分离效果较差

* 使用 htdemucs\_ft 模型
* 检查输入质量
* 避免高度压缩的 MP3

### 伪影

* 增加重叠
* 使用更高质量的模型
* 检查输入是否有削波（clipping）

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## 使用以下方式支付

* [RVC 语音克隆](https://docs.clore.ai/guides/guides_v2-zh/yin-pin-yu-yu-yin/rvc-voice-clone) - 处理提取出的人声
* [AudioCraft Music](https://docs.clore.ai/guides/guides_v2-zh/yin-pin-yu-yu-yin/audiocraft-music) - 生成新的音乐
* [Whisper 转录](https://docs.clore.ai/guides/guides_v2-zh/yin-pin-yu-yu-yin/whisper-transcription) - 转录人声


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/yin-pin-yu-yu-yin/demucs-separation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
