# AudioCraft Music

Generate music and audio with Meta's AudioCraft (MusicGen).

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is AudioCraft?

AudioCraft includes:

* **MusicGen** - Text-to-music generation
* **AudioGen** - Sound effects generation
* **EnCodec** - Audio compression
* **MAGNeT** - Faster generation

## Model Sizes

| Model  | VRAM | Quality        | Speed  |
| ------ | ---- | -------------- | ------ |
| small  | 4GB  | Good           | Fast   |
| medium | 8GB  | Great          | Medium |
| large  | 16GB | Best           | Slow   |
| melody | 8GB  | Great + melody | Medium |

## Quick Deploy

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install audiocraft gradio scipy && \
python -c "
import gradio as gr
from audiocraft.models import MusicGen
import scipy.io.wavfile as wav
import tempfile

model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=10)

def generate(prompt, duration):
    model.set_generation_params(duration=duration)
    output = model.generate([prompt])
    audio = output[0].cpu().numpy().T
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as f:
        wav.write(f.name, 32000, audio)
        return f.name

demo = gr.Interface(
    fn=generate,
    inputs=[gr.Textbox(label='Prompt'), gr.Slider(5, 30, value=10, label='Duration (s)')],
    outputs=gr.Audio(label='Generated Music'),
    title='MusicGen'
)
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Installation

```bash
pip install audiocraft
pip install scipy torchaudio
```

## MusicGen: Text-to-Music

### Basic Generation

```python
from audiocraft.models import MusicGen
import torchaudio

# Load model
model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=15)  # seconds

# Generate
prompt = "upbeat electronic dance music with heavy bass"
output = model.generate([prompt])

# Save
audio = output[0].cpu()
torchaudio.save("music.wav", audio, sample_rate=32000)
```

### Multiple Prompts

```python
prompts = [
    "relaxing piano jazz",
    "epic orchestral cinematic",
    "acoustic guitar folk song",
    "aggressive heavy metal"
]

outputs = model.generate(prompts)

for i, output in enumerate(outputs):
    torchaudio.save(f"music_{i}.wav", output.cpu(), sample_rate=32000)
```

### Melody Conditioning

Use a melody as reference:

```python
from audiocraft.models import MusicGen
import torchaudio

# Load melody model
model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=15)

# Load reference melody
melody, sr = torchaudio.load("reference.wav")
melody = melody.unsqueeze(0).cuda()

# Generate with melody
output = model.generate_with_chroma(
    ["jazz piano version"],
    melody,
    sr
)

torchaudio.save("jazz_version.wav", output[0].cpu(), sample_rate=32000)
```

### Continuation

Continue from existing audio:

```python

# Load audio to continue
audio, sr = torchaudio.load("start.wav")
audio = audio.unsqueeze(0).cuda()

# Continue
output = model.generate_continuation(
    audio,
    prompt_sample_rate=sr,
    descriptions=["more energetic with drums"],
    progress=True
)

torchaudio.save("continued.wav", output[0].cpu(), sample_rate=32000)
```

## AudioGen: Sound Effects

```python
from audiocraft.models import AudioGen

# Load model
model = AudioGen.get_pretrained('facebook/audiogen-medium')
model.set_generation_params(duration=5)

# Generate sounds
prompts = [
    "dog barking in the distance",
    "rain on a window",
    "car engine starting",
    "crowd cheering at a concert"
]

outputs = model.generate(prompts)

for i, output in enumerate(outputs):
    torchaudio.save(f"sound_{i}.wav", output.cpu(), sample_rate=16000)
```

## Generation Parameters

```python
model.set_generation_params(
    duration=30,           # Length in seconds
    top_k=250,             # Top-k sampling
    top_p=0.0,             # Nucleus sampling (0 = disabled)
    temperature=1.0,       # Randomness
    cfg_coef=3.0,          # Classifier-free guidance
    two_step_cfg=False,    # Two-step CFG
)
```

### Parameter Effects

| Parameter   | Low Value            | High Value       |
| ----------- | -------------------- | ---------------- |
| temperature | Conservative         | Creative         |
| top\_k      | More focused         | More variety     |
| cfg\_coef   | Loose interpretation | Strict to prompt |

## Batch Processing

```python
from audiocraft.models import MusicGen
import torchaudio
import os

model = MusicGen.get_pretrained('facebook/musicgen-medium')
model.set_generation_params(duration=15)

prompts = [
    {"name": "intro", "prompt": "mysterious ambient intro, slow build"},
    {"name": "verse", "prompt": "chill lo-fi hip hop beat"},
    {"name": "chorus", "prompt": "energetic electronic pop chorus"},
    {"name": "outro", "prompt": "calm piano fade out"},
]

output_dir = "./music_parts"
os.makedirs(output_dir, exist_ok=True)

for item in prompts:
    output = model.generate([item["prompt"]])
    torchaudio.save(
        os.path.join(output_dir, f"{item['name']}.wav"),
        output[0].cpu(),
        sample_rate=32000
    )
    print(f"Generated: {item['name']}")
```

## Streaming Generation

```python
from audiocraft.models import MusicGen
import torch

model = MusicGen.get_pretrained('facebook/musicgen-small')

# Enable streaming
streamer = model.get_streaming_generator(
    "upbeat pop music",
    max_gen_len=256  # tokens
)

all_tokens = []
for tokens in streamer:
    all_tokens.append(tokens)
    # Process chunk...

# Decode all
audio = model.decode(torch.cat(all_tokens, dim=-1))
```

## Stereo Generation

```python
from audiocraft.models import MusicGen

# Load stereo model
model = MusicGen.get_pretrained('facebook/musicgen-stereo-medium')
model.set_generation_params(duration=15)

output = model.generate(["cinematic orchestral score"])

# Output shape: [batch, 2, samples] for stereo

torchaudio.save("stereo_music.wav", output[0].cpu(), sample_rate=32000)
```

## API Server

```python
from fastapi import FastAPI
from fastapi.responses import FileResponse
from audiocraft.models import MusicGen
import torchaudio
import tempfile

app = FastAPI()
model = MusicGen.get_pretrained('facebook/musicgen-medium')

@app.post("/generate")
async def generate_music(prompt: str, duration: int = 10):
    model.set_generation_params(duration=duration)
    output = model.generate([prompt])

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        torchaudio.save(f.name, output[0].cpu(), sample_rate=32000)
        return FileResponse(f.name, media_type="audio/wav")

@app.post("/generate_with_melody")
async def generate_with_melody(prompt: str, melody_path: str, duration: int = 15):
    melody, sr = torchaudio.load(melody_path)

    model_melody = MusicGen.get_pretrained('facebook/musicgen-melody')
    model_melody.set_generation_params(duration=duration)

    output = model_melody.generate_with_chroma([prompt], melody.unsqueeze(0).cuda(), sr)

    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        torchaudio.save(f.name, output[0].cpu(), sample_rate=32000)
        return FileResponse(f.name, media_type="audio/wav")

# Run: uvicorn server:app --host 0.0.0.0 --port 8000
```

## Prompt Engineering

### Effective Prompts

```python

# Genre + instruments + mood
"upbeat jazz with saxophone and piano, happy and energetic"

# Style reference
"lo-fi hip hop beat, chill study music, vinyl crackle"

# Cinematic
"epic orchestral trailer music, building tension, dramatic"

# Specific elements
"acoustic guitar strumming pattern, folk song, campfire vibes"
```

### Bad Prompts

```python

# Too vague
"nice music"  # Not specific enough

# Song lyrics
"Happy birthday to you..."  # Won't work

# Artist names
"like Beatles"  # Doesn't understand artists
```

## Post-Processing

### Combine Clips

```python
from pydub import AudioSegment

intro = AudioSegment.from_wav("intro.wav")
verse = AudioSegment.from_wav("verse.wav")
chorus = AudioSegment.from_wav("chorus.wav")

# Crossfade
song = intro.append(verse, crossfade=1000)
song = song.append(chorus, crossfade=1000)

song.export("full_song.mp3", format="mp3")
```

### Add Effects

```python
from pydub import AudioSegment
from pydub.effects import normalize, compress_dynamic_range

audio = AudioSegment.from_wav("generated.wav")

# Normalize volume
audio = normalize(audio)

# Add compression
audio = compress_dynamic_range(audio)

# Fade in/out
audio = audio.fade_in(2000).fade_out(3000)

audio.export("processed.wav", format="wav")
```

## Memory Optimization

```python
import torch
from audiocraft.models import MusicGen

# Use smaller model
model = MusicGen.get_pretrained('facebook/musicgen-small')

# Enable CPU offload
model.to('cpu')

# Generate on GPU, offload immediately
with torch.cuda.amp.autocast():
    output = model.generate(["prompt"])
    output = output.cpu()
    torch.cuda.empty_cache()
```

## Performance

| Model  | GPU      | 30s Generation |
| ------ | -------- | -------------- |
| small  | RTX 3090 | \~10s          |
| medium | RTX 3090 | \~25s          |
| large  | RTX 4090 | \~45s          |
| melody | RTX 3090 | \~30s          |

## Comparison

| Feature      | MusicGen | Stable Audio | Riffusion |
| ------------ | -------- | ------------ | --------- |
| Quality      | Great    | Great        | Good      |
| Length       | 30s      | 90s          | Loop      |
| Melody Input | Yes      | No           | No        |
| Open Source  | Yes      | No           | Yes       |

## Troubleshooting

### Out of Memory

* Use smaller model (small instead of large)
* Reduce duration
* Clear cache: `torch.cuda.empty_cache()`

### Poor Quality

* Use more specific prompts
* Try medium or large model
* Adjust temperature (0.8-1.2)

### Repetitive Output

* Increase top\_k
* Lower cfg\_coef
* Try different prompts

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* [Bark TTS](/guides/audio-and-voice/bark-tts.md) - Voice generation
* [RVC Voice Clone](/guides/audio-and-voice/rvc-voice-clone.md) - Voice conversion
* [Demucs Separation](/guides/audio-and-voice/demucs-separation.md) - Audio separation


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/audio-and-voice/audiocraft-music.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
