# Музыка AudioCraft

Генерируйте музыку и аудио с помощью AudioCraft от Meta (MusicGen).

{% hint style="success" %}
Все примеры можно запускать на GPU-серверах, арендуемых через [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Аренда на CLORE.AI

1. Посетите [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Отфильтруйте по типу GPU, объему VRAM и цене
3. Выберите **On-Demand** (фиксированная ставка) или **Spot** (цена по ставке)
4. Настройте ваш заказ:
   * Выберите Docker-образ
   * Установите порты (TCP для SSH, HTTP для веб-интерфейсов)
   * Добавьте переменные окружения при необходимости
   * Введите команду запуска
5. Выберите способ оплаты: **CLORE**, **BTC**, или **USDT/USDC**
6. Создайте заказ и дождитесь развертывания

### Доступ к вашему серверу

* Найдите данные для подключения в **Моих заказах**
* Веб-интерфейсы: используйте URL HTTP-порта
* SSH: `ssh -p <port> root@<proxy-address>`

## Что такое AudioCraft?

AudioCraft включает:

* **MusicGen** - Генерация музыки по тексту
* **AudioGen** - Генерация звуковых эффектов
* **EnCodec** - Аудиокомпрессия
* **MAGNeT** - Быстрая генерация

## Размеры моделей

| Модель | VRAM | Качество       | Скорость |
| ------ | ---- | -------------- | -------- |
| small  | 4 ГБ | Хорошо         | Быстро   |
| medium | 8GB  | Отлично        | Средне   |
| large  | 16GB | Лучшее         | Медленно |
| melody | 8GB  | Great + melody | Средне   |

## Быстрое развертывание

**Docker-образ:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Порты:**

```
22/tcp
7860/http
```

**Команда:**

```bash
pip install audiocraft gradio scipy && \
python -c "
import gradio as gr
from audiocraft.models import MusicGen
import scipy.io.wavfile as wav
import tempfile

model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=10)

def generate(prompt, duration):
    model.set_generation_params(duration=duration)
    output = model.generate([prompt])
    audio = output[0].cpu().numpy().T
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        wav.write(f.name, 32000, audio)
        return f.name

demo = gr.Interface(
    fn=generate,
    inputs=[gr.Textbox(label='Prompt'), gr.Slider(5, 30, value=10, label='Duration (s)')],
    outputs=gr.Audio(label='Generated Music'),
    title='MusicGen'
)
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Доступ к вашему сервису

После развертывания найдите ваш `http_pub` URL в **Моих заказах**:

1. Перейдите на **Моих заказах** страницу
2. Нажмите на ваш заказ
3. Найдите `http_pub` URL (например, `abc123.clorecloud.net`)

Используйте `https://YOUR_HTTP_PUB_URL` вместо `localhost` в примерах ниже.

## Установка

```bash
pip install audiocraft
pip install scipy torchaudio
```

## MusicGen: Text-to-Music

### Базовая генерация

```python
from audiocraft.models import MusicGen
import torchaudio

# Загрузить модель
model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=15)  # seconds

# Генерация
prompt = "upbeat electronic dance music with heavy bass"
output = model.generate([prompt])

# Сохранить
audio = output[0].cpu()
torchaudio.save("music.wav", audio, sample_rate=32000)
```

### Multiple Prompts

```python
prompts = [
    "relaxing piano jazz",
    "epic orchestral cinematic",
    "acoustic guitar folk song",
    "aggressive heavy metal"
]

outputs = model.generate(prompts)

for i, output in enumerate(outputs):
    torchaudio.save(f"music_{i}.wav", output.cpu(), sample_rate=32000)
```

### Melody Conditioning

Use a melody as reference:

```python
from audiocraft.models import MusicGen
import torchaudio

# Load melody model
model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=15)

# Load reference melody
melody, sr = torchaudio.load("reference.wav")
melody = melody.unsqueeze(0).cuda()

# Generate with melody
output = model.generate_with_chroma(
    ["jazz piano version"],
    melody,
    sr
)

torchaudio.save("jazz_version.wav", output[0].cpu(), sample_rate=32000)
```

### Continuation

Continue from existing audio:

```python

# Load audio to continue
audio, sr = torchaudio.load("start.wav")
audio = audio.unsqueeze(0).cuda()

# Continue
output = model.generate_continuation(
    audio,
    prompt_sample_rate=sr,
    descriptions=["more energetic with drums"],
    progress=True
)

torchaudio.save("continued.wav", output[0].cpu(), sample_rate=32000)
```

## AudioGen: Sound Effects

```python
from audiocraft.models import AudioGen

# Загрузить модель
model = AudioGen.get_pretrained('facebook/audiogen-medium')
model.set_generation_params(duration=5)

# Generate sounds
prompts = [
    "dog barking in the distance",
    "rain on a window",
    "car engine starting",
    "crowd cheering at a concert"
]

outputs = model.generate(prompts)

for i, output in enumerate(outputs):
    torchaudio.save(f"sound_{i}.wav", output.cpu(), sample_rate=16000)
```

## Параметры генерации

```python
model.set_generation_params(
    duration=30,           # Length in seconds
    top_k=250,             # Top-k sampling
    top_p=0.0,             # Nucleus sampling (0 = disabled)
    temperature=1.0,       # Randomness
    cfg_coef=3.0,          # Classifier-free guidance
    two_step_cfg=False,    # Two-step CFG
)
```

### Parameter Effects

| Параметр    | Low Value            | High Value       |
| ----------- | -------------------- | ---------------- |
| temperature | Conservative         | Creative         |
| top\_k      | More focused         | More variety     |
| cfg\_coef   | Loose interpretation | Strict to prompt |

## Пакетная обработка

```python
from audiocraft.models import MusicGen
import torchaudio
import os

model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=15)

prompts = [
    {"name": "intro", "prompt": "mysterious ambient intro, slow build"},
    {"name": "verse", "prompt": "chill lo-fi hip hop beat"},
    {"name": "chorus", "prompt": "energetic electronic pop chorus"},
    {"name": "outro", "prompt": "calm piano fade out"},
]

output_dir = "./music_parts"
os.makedirs(output_dir, exist_ok=True)

for item in prompts:
    output = model.generate([item["prompt"]])
    torchaudio.save(
        os.path.join(output_dir, f"{item['name']}.wav"),
        output[0].cpu(),
        sample_rate=32000
    )
    print(f"Generated: {item['name']}")
```

## Потоковая генерация

```python
from audiocraft.models import MusicGen
import torch

model = MusicGen.get_pretrained('facebook/musicgen-small')

# Enable streaming
streamer = model.get_streaming_generator(
    "upbeat pop music",
    max_gen_len=256  # tokens
)

all_tokens = []
for tokens in streamer:
    all_tokens.append(tokens)
    # Обработать чанк...

# Decode all
audio = model.decode(torch.cat(all_tokens, dim=-1))
```

## Stereo Generation

```python
from audiocraft.models import MusicGen

# Load stereo model
model = MusicGen.get_pretrained('facebook/musicgen-stereo-medium')
model.set_generation_params(duration=15)

output = model.generate(["cinematic orchestral score"])

# Output shape: [batch, 2, samples] for stereo

torchaudio.save("stereo_music.wav", output[0].cpu(), sample_rate=32000)
```

## API-сервер

```python
from fastapi import FastAPI
from fastapi.responses import FileResponse
from audiocraft.models import MusicGen
import torchaudio
import tempfile

app = FastAPI()
model = MusicGen.get_pretrained('facebook/musicgen-medium')

@app.post("/generate")
async def generate_music(prompt: str, duration: int = 10):
    model.set_generation_params(duration=duration)
    output = model.generate([prompt])

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        torchaudio.save(f.name, output[0].cpu(), sample_rate=32000)
        return FileResponse(f.name, media_type="audio/wav")

@app.post("/generate_with_melody")
async def generate_with_melody(prompt: str, melody_path: str, duration: int = 15):
    melody, sr = torchaudio.load(melody_path)

    model_melody = MusicGen.get_pretrained('facebook/musicgen-melody')
    model_melody.set_generation_params(duration=duration)

    output = model_melody.generate_with_chroma([prompt], melody.unsqueeze(0).cuda(), sr)

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        torchaudio.save(f.name, output[0].cpu(), sample_rate=32000)
        return FileResponse(f.name, media_type="audio/wav")

# Запуск: uvicorn server:app --host 0.0.0.0 --port 8000
```

## Prompt Engineering

### Effective Prompts

```python

# Genre + instruments + mood
"upbeat jazz with saxophone and piano, happy and energetic"

# Референс стиля
"lo-fi hip hop beat, chill study music, vinyl crackle"

# Cinematic
"epic orchestral trailer music, building tension, dramatic"

# Specific elements
"acoustic guitar strumming pattern, folk song, campfire vibes"
```

### Bad Prompts

```python

# Too vague
"nice music"  # Not specific enough

# Song lyrics
"Happy birthday to you..."  # Won't work

# Artist names
"like Beatles"  # Doesn't understand artists
```

## Постобработка

### Combine Clips

```python
from pydub import AudioSegment

intro = AudioSegment.from_wav("intro.wav")
verse = AudioSegment.from_wav("verse.wav")
chorus = AudioSegment.from_wav("chorus.wav")

# Crossfade
song = intro.append(verse, crossfade=1000)
song = song.append(chorus, crossfade=1000)

song.export("full_song.mp3", format="mp3")
```

### Add Effects

```python
from pydub import AudioSegment
from pydub.effects import normalize, compress_dynamic_range

audio = AudioSegment.from_wav("generated.wav")

# Normalize volume
audio = normalize(audio)

# Add compression
audio = compress_dynamic_range(audio)

# Fade in/out
audio = audio.fade_in(2000).fade_out(3000)

audio.export("processed.wav", format="wav")
```

## Оптимизация памяти

```python
import torch
from audiocraft.models import MusicGen

# Используйте меньшую модель
model = MusicGen.get_pretrained('facebook/musicgen-small')

# Включить выгрузку на CPU
model.to('cpu')

# Generate on GPU, offload immediately
with torch.cuda.amp.autocast():
    output = model.generate(["prompt"])
    output = output.cpu()
    torch.cuda.empty_cache()
```

## Производительность

| Модель | GPU      | 30s Generation |
| ------ | -------- | -------------- |
| small  | RTX 3090 | \~10с          |
| medium | RTX 3090 | \~25с          |
| large  | RTX 4090 | \~45с          |
| melody | RTX 3090 | \~30с          |

## Сравнение

| Функция                   | MusicGen | Stable Audio | Riffusion |
| ------------------------- | -------- | ------------ | --------- |
| Качество                  | Отлично  | Отлично      | Хорошо    |
| Length                    | 30с      | 90s          | Loop      |
| Melody Input              | Да       | Нет          | Нет       |
| С открытым исходным кодом | Да       | Нет          | Да        |

## Устранение неполадок

### Недостаточно памяти

* Use smaller model (small instead of large)
* Сократите длительность
* Очистить кэш: `torch.cuda.empty_cache()`

### Плохое качество

* Use more specific prompts
* Try medium or large model
* Adjust temperature (0.8-1.2)

### Repetitive Output

* Increase top\_k
* Lower cfg\_coef
* Try different prompts

## Оценка стоимости

Типичные ставки на маркетплейсе CLORE.AI (по состоянию на 2024):

| GPU       | Почасовая ставка | Дневная ставка | Сессия 4 часа |
| --------- | ---------------- | -------------- | ------------- |
| RTX 3060  | \~$0.03          | \~$0.70        | \~$0.12       |
| RTX 3090  | \~$0.06          | \~$1.50        | \~$0.25       |
| RTX 4090  | \~$0.10          | \~$2.30        | \~$0.40       |
| A100 40GB | \~$0.17          | \~$4.00        | \~$0.70       |
| A100 80GB | \~$0.25          | \~$6.00        | \~$1.00       |

*Цены варьируются в зависимости от провайдера и спроса. Проверьте* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *для текущих тарифов.*

**Экономьте деньги:**

* Используйте **Spot** рынок для гибких рабочих нагрузок (часто на 30–50% дешевле)
* Платите с помощью **CLORE** токенов
* Сравнивайте цены у разных провайдеров

## Дальнейшие шаги

* [Bark TTS](https://docs.clore.ai/guides/guides_v2-ru/audio-i-golos/bark-tts) - Генерация голоса
* [RVC Клонирование голоса](https://docs.clore.ai/guides/guides_v2-ru/audio-i-golos/rvc-voice-clone) - Преобразование голоса
* [Demucs Separation](https://docs.clore.ai/guides/guides_v2-ru/audio-i-golos/demucs-separation) - Разделение аудио


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-ru/audio-i-golos/audiocraft-music.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
