AnimateDiff

AnimateDiff is a plug-and-play module that animates your existing Stable Diffusion models without any additional training. With over 10,000 GitHub stars, it is the go-to framework for turning still-image SD checkpoints into smooth, temporally-consistent video generators. Run it on a Clore.ai GPU instance using ComfyUI as the front-end for maximum flexibility.


What is AnimateDiff?

AnimateDiff inserts a motion module into a frozen Stable Diffusion U-Net. The motion module is trained once on video data and can be combined with any fine-tuned SD 1.5 checkpoint — DreamBooth models, LoRAs, ControlNet adapters — without re-training. The result is short animated clips (typically 16–32 frames at 8 fps) that preserve the style of the base model.

Key highlights:

  • Works with any SD 1.5 checkpoint out of the box

  • Compatible with ControlNet, IP-Adapter, LoRAs, and other extensions

  • ComfyUI node ecosystem provides full pipeline control

  • SDXL motion modules available for higher-resolution output

  • Community-maintained model zoo with domain-specific motion modules


Prerequisites

Requirement
Minimum
Recommended

GPU VRAM

8 GB

16–24 GB

GPU

RTX 3080

RTX 4090 / A100

RAM

16 GB

32 GB

Storage

20 GB

50+ GB

circle-info

AnimateDiff with a standard 16-frame sequence at 512×512 consumes approximately 8–10 GB VRAM. For 768×768 or longer sequences, 16+ GB is recommended.


Step 1 — Rent a GPU on Clore.ai

  1. Go to clore.aiarrow-up-right and sign in.

  2. Click Marketplace and filter by VRAM (≥ 16 GB for best results).

  3. Select a server — RTX 4090 or A6000 offers the best price/performance.

  4. Under Docker image, enter your custom image (see Step 2 below).

  5. Configure open ports: 22 (SSH) and 8188 (ComfyUI web UI).

  6. Click Rent and wait for the instance to start (~1–2 minutes).

circle-info

Use the Advanced port configuration to map port 8188 to a public port. Note the assigned public port — you will use it to access the ComfyUI web interface.


Step 2 — Docker Image

There is no single official AnimateDiff Docker image. The recommended approach is to use a ComfyUI-based image with AnimateDiff nodes pre-installed.

Recommended public image:

Or build your own:


Step 3 — Connect via SSH

Once the instance is running, connect via SSH to download models:

Replace <clore-host> and <assigned-ssh-port> with the values shown in your Clore.ai dashboard.


Step 4 — Download Models

AnimateDiff requires at minimum a base SD 1.5 checkpoint and a motion module.

Download Motion Module

Download a Base SD 1.5 Checkpoint

circle-info

You can use any SD 1.5 fine-tune. Popular choices include DreamShaper, Deliberate, and Epicphotogasm. Download from CivitAI or Hugging Face.

(Optional) Download SDXL Motion Module


Step 5 — Access ComfyUI

Open your browser and navigate to:

You should see the ComfyUI node editor interface.

circle-info

Bookmark this URL. ComfyUI autosaves your workflow as you work — no need to manually save unless exporting JSON.


Step 6 — Load an AnimateDiff Workflow

Basic AnimateDiff Workflow (JSON)

In ComfyUI, press Load and paste or import this workflow JSON, or build it manually with these nodes:

Core node chain:

  1. Load Checkpoint → your SD 1.5 checkpoint

  2. CLIP Text Encode (Prompt) → positive and negative prompts

  3. AnimateDiff Loader → select your motion module

  4. KSampler (Efficient) → sampling settings

  5. VAE Decode → decode latents

  6. Video Combine (VideoHelperSuite) → export as GIF/MP4

Parameter
Value

Steps

20–25

CFG Scale

7–8

Sampler

DPM++ 2M Karras

Width × Height

512 × 512

Frames

16

Context Length

16


Step 7 — Run Your First Animation

  1. In the CLIP Text Encode node, enter your prompt:

  2. In the negative prompt node:

  3. In AnimateDiff Loader, select v3_sd15_mm.ckpt

  4. Click Queue Prompt

circle-info

Generation time for 16 frames at 512×512 with 20 steps is approximately 30–60 seconds on an RTX 4090. Longer sequences and higher resolutions scale linearly.


Advanced Techniques

Using ControlNet with AnimateDiff

AnimateDiff works with ControlNet for guided video generation:

Add a ControlNet Apply node between Load ControlNet Model and KSampler. Use an OpenPose skeleton image as the conditioning input.

Prompt Travel (Keyframe Animation)

The AnimateDiff-Evolved node supports prompt travel — different text prompts at different frames:

This creates smooth transitions between scenes without manual keyframing.

Using LoRA with AnimateDiff

Add a LoRA Loader node to apply camera motion effects: PanLeft, PanRight, ZoomIn, ZoomOut, RollingAnticlockwise.


Output Formats

AnimateDiff via VideoHelperSuite supports:

Format
Node
Notes

GIF

Video Combine

Best for sharing

MP4 (h264)

Video Combine

Smallest file size

WebP

Video Combine

Good quality/size

PNG frames

Save Image

For post-processing


Troubleshooting

Out of Memory (CUDA OOM)

Solutions:

  • Reduce frame count (try 8 instead of 16)

  • Reduce resolution (512×512 is the sweet spot for SD 1.5)

  • Enable --lowvram flag in ComfyUI startup command

  • Use fp16 precision in Load Checkpoint node

Motion Module Not Found

Solution: Verify the .ckpt file is in:

Refresh the ComfyUI page to reload available models.

Flickering / Inconsistent Frames

Solutions:

  • Increase context_length to match total frame count

  • Use v3_sd15_mm.ckpt instead of v2 (better temporal consistency)

  • Lower CFG scale (try 7 instead of 9)

  • Use a lower-variance sampler: DPM++ 2M Karras or Euler a

SSH Connection Refused

Solution: Wait 1–2 minutes for the SSH daemon to start, or check if the container has fully initialized via the Clore.ai dashboard logs.


Clore.ai GPU Recommendations

AnimateDiff uses SD 1.5 backbone — VRAM requirements are modest compared to modern video models, making it budget-friendly.

GPU
VRAM
Clore.ai Price
16-frame @ 512px
Notes

RTX 3090

24 GB

~$0.12/hr

~50s

Best value — run multiple queued batches

RTX 4090

24 GB

~$0.70/hr

~30s

Fastest consumer GPU

A100 40GB

40 GB

~$1.20/hr

~18s

Overkill for SD 1.5, but good for SDXL+AnimateDiff

RTX 3080 10GB

10 GB

~$0.07/hr

~90s

Budget minimum — limited to 512px, shorter clips

circle-info

RTX 3090 is the AnimateDiff sweet spot at ~$0.12/hr. A 16-frame animation takes ~50 seconds, meaning you can generate 70+ clips per dollar spent. For high-volume content creation, batch queue in ComfyUI and run overnight.

SDXL AnimateDiff users: The SDXL motion modules require 12GB+ VRAM for 768px. RTX 3090/4090 handle this well. RTX 3080 (10GB) is too limited for SDXL workflows.


Useful Resources

Last updated

Was this helpful?