ACE-Step Music Generation

Generate full songs with vocals using ACE-Step — open-source Suno alternative on <4GB VRAM

ACE-Step 1.5 is the open-source music generation breakthrough everyone was waiting for. It generates complete songs with vocals and instruments from text prompts, rivaling commercial services like Suno — but runs locally on your GPU with an MIT license. The killer feature? It needs less than 4GB VRAM, making it the most accessible AI music tool ever. Generate a full track in 2–8 seconds on an RTX 4090.

Key Features

  • Full song generation: Vocals + instruments + effects in one pass

  • < 4GB VRAM: Runs on even the cheapest GPUs (RTX 3060, even GTX 1060!)

  • 2–8 seconds per track: Near-instant generation on modern GPUs

  • MIT license: Full commercial use, no restrictions

  • Lyrics support: Write your own lyrics with verse/chorus structure

  • Style control: Genre tags, mood, tempo, instrumentation

  • ComfyUI integration: Node-based workflow for complex music pipelines

Requirements

Component
Minimum
Recommended

GPU

Any with 4GB VRAM

RTX 3060 or better

VRAM

4GB

6GB+

RAM

8GB

16GB

Disk

10GB

15GB

Python

3.10+

3.11

Recommended Clore.ai GPU: RTX 3060 6GB (~$0.15–0.3/day) — yes, the cheapest GPU works!

Speed Reference

GPU
Generation Time (30s track)

GTX 1060 6GB

~15–20 sec

RTX 3060 12GB

~6–10 sec

RTX 3080 10GB

~4–6 sec

RTX 4090 24GB

~2–3 sec

Installation

Standalone

ComfyUI Integration

Quick Start

Installation

ACE-Step is a Gradio web app — not a pip package. Install from Git:

Launch Web UI

Open http://localhost:7860 in your browser. The UI has:

  1. Prompt field — describe the style: "upbeat electronic pop, 120 BPM"

  2. Lyrics field — write verses with [Verse], [Chorus] tags

  3. Duration slider — 15–120 seconds

  4. Generate button — click and wait 2–8 seconds

Generate with Lyrics (Web UI)

Enter in the lyrics field:

Set prompt to: indie rock ballad, acoustic guitar, emotional, male vocal

CLI / Pipeline Usage

ComfyUI Integration (Batch Workflow)

ComfyUI nodes let you batch-generate multiple tracks with different prompts in a visual workflow.

Style Tags

Control generation with style tags:

Web UI

The web UI provides:

  • Text prompt input with style presets

  • Lyrics editor with verse/chorus formatting

  • Duration and quality sliders

  • Real-time waveform preview

  • Download as WAV or MP3

Use Cases on Clore.ai

Use Case
Setup
Cost

Background music for videos

RTX 3060, batch generate

~$0.15/day

Song prototyping / demos

RTX 3080, real-time

~$0.3/day

Music production pipeline

RTX 4090 + ComfyUI

~$1/day

Podcast intros/outros

Any GPU, one-shot

~$0.15/day

Tips for Clore.ai Users

  • Cheapest AI workload possible: At $0.15/day for RTX 3060, generate hundreds of tracks for pennies

  • Batch overnight: Rent a GPU for 8 hours ($0.05–0.1), generate 500+ tracks

  • ComfyUI for pipelines: Chain with image generation for album art workflows

  • Export quality: Generate at highest quality, then process in a DAW if needed

  • Style mixing: Combine genres in prompts: "lo-fi jazz hip hop with vinyl crackle" works surprisingly well

Troubleshooting

Issue
Solution

CUDA not found

Ensure PyTorch is installed with CUDA: pip install torch --index-url https://download.pytorch.org/whl/cu121

Model download slow

Set HF_HUB_ENABLE_HF_TRANSFER=1 for faster downloads

Audio sounds distorted

Try lower temperature (0.7) or fewer inference steps

Out of memory on 4GB

Reduce duration to 15 seconds; upgrade to 6GB GPU

ComfyUI nodes missing

Restart ComfyUI after installing the custom nodes

ACE-Step vs Suno vs AudioCraft

Feature
ACE-Step 1.5
Suno v4
AudioCraft

Full songs

❌ (music only)

Vocals

Local/self-hosted

❌ (cloud)

License

MIT

Proprietary

MIT

Min VRAM

4GB

N/A

16GB

Speed (30s)

2–8 sec

~30 sec

~60 sec

Cost

$0.15/day GPU

$10/mo sub

$0.3/day GPU

Further Reading

Last updated

Was this helpful?