# AudioCraft Music

Generate music and audio with Meta's AudioCraft (MusicGen).

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is AudioCraft?

AudioCraft includes:

* **MusicGen** - Text-to-music generation
* **AudioGen** - Sound effects generation
* **EnCodec** - Audio compression
* **MAGNeT** - Faster generation

## Model Sizes

| Model  | VRAM | Quality        | Speed  |
| ------ | ---- | -------------- | ------ |
| small  | 4GB  | Good           | Fast   |
| medium | 8GB  | Great          | Medium |
| large  | 16GB | Best           | Slow   |
| melody | 8GB  | Great + melody | Medium |

## Quick Deploy

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install audiocraft gradio scipy && \
python -c "
import gradio as gr
from audiocraft.models import MusicGen
import scipy.io.wavfile as wav
import tempfile

model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=10)

def generate(prompt, duration):
    model.set_generation_params(duration=duration)
    output = model.generate([prompt])
    audio = output[0].cpu().numpy().T
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        wav.write(f.name, 32000, audio)
        return f.name

demo = gr.Interface(
    fn=generate,
    inputs=[gr.Textbox(label='Prompt'), gr.Slider(5, 30, value=10, label='Duration (s)')],
    outputs=gr.Audio(label='Generated Music'),
    title='MusicGen'
)
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
pip install audiocraft
pip install scipy torchaudio
```

## MusicGen: Text-to-Music

### Basic Generation

```python
from audiocraft.models import MusicGen
import torchaudio

# Load model
model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=15)  # seconds

# Generate
prompt = "upbeat electronic dance music with heavy bass"
output = model.generate([prompt])

# Save
audio = output[0].cpu()
torchaudio.save("music.wav", audio, sample_rate=32000)
```

### Multiple Prompts

```python
prompts = [
    "relaxing piano jazz",
    "epic orchestral cinematic",
    "acoustic guitar folk song",
    "aggressive heavy metal"
]

outputs = model.generate(prompts)

for i, output in enumerate(outputs):
    torchaudio.save(f"music_{i}.wav", output.cpu(), sample_rate=32000)
```

### Melody Conditioning

Use a melody as reference:

```python
from audiocraft.models import MusicGen
import torchaudio

# Load melody model
model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=15)

# Load reference melody
melody, sr = torchaudio.load("reference.wav")
melody = melody.unsqueeze(0).cuda()

# Generate with melody
output = model.generate_with_chroma(
    ["jazz piano version"],
    melody,
    sr
)

torchaudio.save("jazz_version.wav", output[0].cpu(), sample_rate=32000)
```

### Continuation

Continue from existing audio:

```python

# Load audio to continue
audio, sr = torchaudio.load("start.wav")
audio = audio.unsqueeze(0).cuda()

# Continue
output = model.generate_continuation(
    audio,
    prompt_sample_rate=sr,
    descriptions=["more energetic with drums"],
    progress=True
)

torchaudio.save("continued.wav", output[0].cpu(), sample_rate=32000)
```

## AudioGen: Sound Effects

```python
from audiocraft.models import AudioGen

# Load model
model = AudioGen.get_pretrained('facebook/audiogen-medium')
model.set_generation_params(duration=5)

# Generate sounds
prompts = [
    "dog barking in the distance",
    "rain on a window",
    "car engine starting",
    "crowd cheering at a concert"
]

outputs = model.generate(prompts)

for i, output in enumerate(outputs):
    torchaudio.save(f"sound_{i}.wav", output.cpu(), sample_rate=16000)
```

## Generation Parameters

```python
model.set_generation_params(
    duration=30,           # Length in seconds
    top_k=250,             # Top-k sampling
    top_p=0.0,             # Nucleus sampling (0 = disabled)
    temperature=1.0,       # Randomness
    cfg_coef=3.0,          # Classifier-free guidance
    two_step_cfg=False,    # Two-step CFG
)
```

### Parameter Effects

| Parameter   | Low Value            | High Value       |
| ----------- | -------------------- | ---------------- |
| temperature | Conservative         | Creative         |
| top\_k      | More focused         | More variety     |
| cfg\_coef   | Loose interpretation | Strict to prompt |

## Batch Processing

```python
from audiocraft.models import MusicGen
import torchaudio
import os

model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=15)

prompts = [
    {"name": "intro", "prompt": "mysterious ambient intro, slow build"},
    {"name": "verse", "prompt": "chill lo-fi hip hop beat"},
    {"name": "chorus", "prompt": "energetic electronic pop chorus"},
    {"name": "outro", "prompt": "calm piano fade out"},
]

output_dir = "./music_parts"
os.makedirs(output_dir, exist_ok=True)

for item in prompts:
    output = model.generate([item["prompt"]])
    torchaudio.save(
        os.path.join(output_dir, f"{item['name']}.wav"),
        output[0].cpu(),
        sample_rate=32000
    )
    print(f"Generated: {item['name']}")
```

## Streaming Generation

```python
from audiocraft.models import MusicGen
import torch

model = MusicGen.get_pretrained('facebook/musicgen-small')

# Enable streaming
streamer = model.get_streaming_generator(
    "upbeat pop music",
    max_gen_len=256  # tokens
)

all_tokens = []
for tokens in streamer:
    all_tokens.append(tokens)
    # Process chunk...

# Decode all
audio = model.decode(torch.cat(all_tokens, dim=-1))
```

## Stereo Generation

```python
from audiocraft.models import MusicGen

# Load stereo model
model = MusicGen.get_pretrained('facebook/musicgen-stereo-medium')
model.set_generation_params(duration=15)

output = model.generate(["cinematic orchestral score"])

# Output shape: [batch, 2, samples] for stereo

torchaudio.save("stereo_music.wav", output[0].cpu(), sample_rate=32000)
```

## API Server

```python
from fastapi import FastAPI
from fastapi.responses import FileResponse
from audiocraft.models import MusicGen
import torchaudio
import tempfile

app = FastAPI()
model = MusicGen.get_pretrained('facebook/musicgen-medium')

@app.post("/generate")
async def generate_music(prompt: str, duration: int = 10):
    model.set_generation_params(duration=duration)
    output = model.generate([prompt])

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        torchaudio.save(f.name, output[0].cpu(), sample_rate=32000)
        return FileResponse(f.name, media_type="audio/wav")

@app.post("/generate_with_melody")
async def generate_with_melody(prompt: str, melody_path: str, duration: int = 15):
    melody, sr = torchaudio.load(melody_path)

    model_melody = MusicGen.get_pretrained('facebook/musicgen-melody')
    model_melody.set_generation_params(duration=duration)

    output = model_melody.generate_with_chroma([prompt], melody.unsqueeze(0).cuda(), sr)

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        torchaudio.save(f.name, output[0].cpu(), sample_rate=32000)
        return FileResponse(f.name, media_type="audio/wav")

# Run: uvicorn server:app --host 0.0.0.0 --port 8000
```

## Prompt Engineering

### Effective Prompts

```python

# Genre + instruments + mood
"upbeat jazz with saxophone and piano, happy and energetic"

# Style reference
"lo-fi hip hop beat, chill study music, vinyl crackle"

# Cinematic
"epic orchestral trailer music, building tension, dramatic"

# Specific elements
"acoustic guitar strumming pattern, folk song, campfire vibes"
```

### Bad Prompts

```python

# Too vague
"nice music"  # Not specific enough

# Song lyrics
"Happy birthday to you..."  # Won't work

# Artist names
"like Beatles"  # Doesn't understand artists
```

## Post-Processing

### Combine Clips

```python
from pydub import AudioSegment

intro = AudioSegment.from_wav("intro.wav")
verse = AudioSegment.from_wav("verse.wav")
chorus = AudioSegment.from_wav("chorus.wav")

# Crossfade
song = intro.append(verse, crossfade=1000)
song = song.append(chorus, crossfade=1000)

song.export("full_song.mp3", format="mp3")
```

### Add Effects

```python
from pydub import AudioSegment
from pydub.effects import normalize, compress_dynamic_range

audio = AudioSegment.from_wav("generated.wav")

# Normalize volume
audio = normalize(audio)

# Add compression
audio = compress_dynamic_range(audio)

# Fade in/out
audio = audio.fade_in(2000).fade_out(3000)

audio.export("processed.wav", format="wav")
```

## Memory Optimization

```python
import torch
from audiocraft.models import MusicGen

# Use smaller model
model = MusicGen.get_pretrained('facebook/musicgen-small')

# Enable CPU offload
model.to('cpu')

# Generate on GPU, offload immediately
with torch.cuda.amp.autocast():
    output = model.generate(["prompt"])
    output = output.cpu()
    torch.cuda.empty_cache()
```

## Performance

| Model  | GPU      | 30s Generation |
| ------ | -------- | -------------- |
| small  | RTX 3090 | \~10s          |
| medium | RTX 3090 | \~25s          |
| large  | RTX 4090 | \~45s          |
| melody | RTX 3090 | \~30s          |

## Comparison

| Feature      | MusicGen | Stable Audio | Riffusion |
| ------------ | -------- | ------------ | --------- |
| Quality      | Great    | Great        | Good      |
| Length       | 30s      | 90s          | Loop      |
| Melody Input | Yes      | No           | No        |
| Open Source  | Yes      | No           | Yes       |

## Troubleshooting

### Out of Memory

* Use smaller model (small instead of large)
* Reduce duration
* Clear cache: `torch.cuda.empty_cache()`

### Poor Quality

* Use more specific prompts
* Try medium or large model
* Adjust temperature (0.8-1.2)

### Repetitive Output

* Increase top\_k
* Lower cfg\_coef
* Try different prompts

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* [Bark TTS](https://docs.clore.ai/guides/audio-and-voice/bark-tts) - Voice generation
* [RVC Voice Clone](https://docs.clore.ai/guides/audio-and-voice/rvc-voice-clone) - Voice conversion
* [Demucs Separation](https://docs.clore.ai/guides/audio-and-voice/demucs-separation) - Audio separation
