Mochi-1 Video

Mochi-1 is Genmo's open-source 10-billion parameter video generation model producing 848×480 @ 30fps output with physically realistic motion. It uses an asymmetric diffusion transformer (AsymmDiT) architecture and ranks among the highest-quality open-source video models for motion fidelity. Deploy it on Clore.ai's GPU cloud to generate professional-grade videos at a fraction of commercial API costs.


What is Mochi-1?

Mochi-1 is a 10-billion parameter video diffusion model trained to produce videos with:

  • Smooth, physically plausible motion

  • High temporal consistency

  • Strong prompt adherence

  • 848×480 resolution at 30 fps

It uses an asymmetric diffusion transformer (AsymmDiT) architecture — different encoder depths for video and text — enabling efficient inference at scale. The weights are released under the Genmo Open Source License, free for research and commercial use.

Model highlights:

  • 10B parameters

  • Native 848×480 @ 30 fps output

  • High-motion fidelity (ranked top in community benchmarks)

  • Available on Hugging Face with diffusers integration

  • Gradio demo UI for easy interaction


Prerequisites

Requirement
Minimum
Recommended

GPU VRAM

24 GB

40–80 GB

GPU

RTX 4090

A100 / H100

RAM

32 GB

64 GB

Storage

60 GB

100 GB

CUDA

11.8+

12.1+

circle-exclamation

Step 1 — Rent a GPU on Clore.ai

  1. Go to clore.aiarrow-up-right and sign in.

  2. Click Marketplace and filter:

    • VRAM: ≥ 24 GB (RTX 4090 minimum, A100 recommended)

    • For multi-GPU: filter by GPU count ≥ 2

  3. Select your server and click Configure.

  4. Set Docker image to pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel (base image — we install Mochi inside).

  5. Set open ports: 22 (SSH) and 7860 (Gradio UI).

  6. Click Rent.

circle-info

Clore.ai lists A100 40 GB instances starting from ~$0.60–$0.90/hr. For Mochi-1 at full quality, this is the most cost-effective choice.


Step 2 — Custom Dockerfile

Build your own image or use this Dockerfile to create a ready-to-use Mochi-1 environment:

Build and Push to Docker Hub

Build the image locally and push it to your own Docker Hub account (replace YOUR_DOCKERHUB_USERNAME with your actual username):

Then use YOUR_DOCKERHUB_USERNAME/mochi-1:latest as your Docker image in Clore.ai.

circle-info

There is no official pre-built Docker image for Mochi-1 on Docker Hub. You need to build from the Dockerfile above. Alternatively, use pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel as the base image directly and run the setup commands manually via SSH.


Step 3 — Connect via SSH

Once your instance is running:


Step 4 — Download Mochi-1 Weights

The model weights are hosted on Hugging Face. Download them via the huggingface_hub CLI:

circle-info

The full bf16 model is approximately 80 GB. The fp8 quantized version is ~40 GB and runs on RTX 4090 (24 GB) with CPU offloading. Specify --include "*fp8*" to download only quantized weights.

Alternative: Download Only fp8 Quantized Weights


Step 5 — Launch the Gradio Demo

Mochi-1 ships with a Gradio web UI for easy text-to-video generation:

For low-VRAM mode (RTX 4090, 24 GB):

circle-info

The --cpu_offload flag moves model layers to CPU RAM when not in use, reducing peak VRAM to ~18–20 GB at the cost of ~2× slower generation.


Step 6 — Access the Web UI

Open your browser and navigate to:

You will see the Mochi-1 Gradio interface with:

  • A text prompt input

  • Generation settings (steps, guidance scale, seed)

  • Video output player


Step 7 — Generate Your First Video

Example Prompts

Nature scene:

Action scene:

Abstract/artistic:

Parameter
Value

Steps

64

Guidance Scale

4.5

Duration

5.1 seconds (default)

Resolution

848×480 (native)

circle-info

Generation time varies significantly by GPU. On an A100 80 GB, a 5-second video takes approximately 2–4 minutes. On RTX 4090 with CPU offload, expect 8–15 minutes.


Python API Usage

For programmatic generation, use the diffusers pipeline:

Batch Generation Script


Multi-GPU Inference

For faster generation with multiple GPUs:

circle-info

Clore.ai offers multi-GPU servers (2×, 4× RTX 4090 or A100). With 2× A100 80 GB, generation time drops to under 60 seconds for a 5-second clip.


Troubleshooting

CUDA Out of Memory

Solutions:

  1. Add --cpu_offload to the gradio command

  2. Enable VAE slicing: pipe.enable_vae_slicing()

  3. Reduce num_frames (try 24 instead of 84)

  4. Use fp8 quantized weights instead of bf16

Model Loading Slow

Solution: Ensure weights are on a fast NVMe drive, not HDD. Check storage speed:

Video Artifacts / Temporal Flickering

Solutions:

  • Increase inference steps (try 80–100)

  • Adjust guidance scale (3.5–5.0 range is usually best)

  • Use a specific seed for reproducibility and iteration

Port 7860 Not Accessible

Check that the port was correctly opened in Clore.ai and the Gradio server is binding to 0.0.0.0:


Cost Estimation

GPU
VRAM
Est. Price
5s video time

RTX 4090

24 GB

~$0.35/hr

~10–15 min

A100 40GB

40 GB

~$0.70/hr

~3–5 min

A100 80GB

80 GB

~$1.20/hr

~2–3 min

2× A100 80GB

160 GB

~$2.20/hr

~60–90 sec


Clore.ai GPU Recommendations

Mochi-1 is VRAM-hungry — the 10B parameter model requires careful GPU selection.

GPU
VRAM
Clore.ai Price
Mode
5s Video Generation Time

RTX 4090

24 GB

~$0.70/hr

fp8 quantized only

~10–15 min

A100 40GB

40 GB

~$1.20/hr

bf16 recommended

~3–5 min

A100 80GB

80 GB

~$2.00/hr

full bf16, fast

~2–3 min

2× A100 80GB

160 GB

~$4.00/hr

tensor parallel, fastest

~60–90 sec

circle-exclamation

Best value for quality: A100 40GB at ~$1.20/hr generates a 5-second clip in 3–5 minutes. That's ~$0.08–0.10 per video clip — significantly cheaper than Runway ML ($0.25–0.50/clip) or Pika Labs subscriptions.


Useful Resources

Last updated

Was this helpful?