# Mochi-1 Video

**Mochi-1** is Genmo's open-source 10-billion parameter video generation model producing 848×480 @ 30fps output with physically realistic motion. It uses an asymmetric diffusion transformer (AsymmDiT) architecture and ranks among the highest-quality open-source video models for motion fidelity. Deploy it on Clore.ai's GPU cloud to generate professional-grade videos at a fraction of commercial API costs.

***

## What is Mochi-1?

Mochi-1 is a **10-billion parameter** video diffusion model trained to produce videos with:

* Smooth, physically plausible motion
* High temporal consistency
* Strong prompt adherence
* 848×480 resolution at 30 fps

It uses an **asymmetric diffusion transformer** (AsymmDiT) architecture — different encoder depths for video and text — enabling efficient inference at scale. The weights are released under the Genmo Open Source License, free for research and commercial use.

**Model highlights:**

* 10B parameters
* Native 848×480 @ 30 fps output
* High-motion fidelity (ranked top in community benchmarks)
* Available on Hugging Face with diffusers integration
* Gradio demo UI for easy interaction

***

## Prerequisites

| Requirement | Minimum  | Recommended |
| ----------- | -------- | ----------- |
| GPU VRAM    | 24 GB    | 40–80 GB    |
| GPU         | RTX 4090 | A100 / H100 |
| RAM         | 32 GB    | 64 GB       |
| Storage     | 60 GB    | 100 GB      |
| CUDA        | 11.8+    | 12.1+       |

{% hint style="warning" %}
Mochi-1 is a large model (≈40 GB in fp8 / ≈80 GB in bf16). A single RTX 4090 (24 GB) can run it with quantization. For full quality, use an A100 40 GB or larger. Multi-GPU setups are supported.
{% endhint %}

***

## Step 1 — Rent a GPU on Clore.ai

1. Go to [clore.ai](https://clore.ai) and sign in.
2. Click **Marketplace** and filter:
   * VRAM: **≥ 24 GB** (RTX 4090 minimum, A100 recommended)
   * For multi-GPU: filter by GPU count ≥ 2
3. Select your server and click **Configure**.
4. Set Docker image to `pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel` (base image — we install Mochi inside).
5. Set open ports: `22` (SSH) and `7860` (Gradio UI).
6. Click **Rent**.

{% hint style="info" %}
Clore.ai lists A100 40 GB instances starting from \~$0.60–$0.90/hr. For Mochi-1 at full quality, this is the most cost-effective choice.
{% endhint %}

***

## Step 2 — Custom Dockerfile

Build your own image or use this `Dockerfile` to create a ready-to-use Mochi-1 environment:

```dockerfile
FROM pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    git wget curl ffmpeg \
    libgl1 libglib2.0-0 \
    openssh-server \
    && rm -rf /var/lib/apt/lists/*

# Configure SSH
RUN mkdir /var/run/sshd && \
    echo 'root:clore123' | chpasswd && \
    sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
    sed -i 's/UsePAM yes/UsePAM no/' /etc/ssh/sshd_config

WORKDIR /workspace

# Clone Mochi-1 repository
RUN git clone https://github.com/genmoai/mochi /workspace/mochi

# Install Python dependencies
RUN cd /workspace/mochi && \
    pip install --upgrade pip && \
    pip install -e . && \
    pip install gradio huggingface_hub

EXPOSE 22 7860

CMD service ssh start && \
    echo "Mochi-1 environment ready. Run download script then launch demo." && \
    tail -f /dev/null
```

### Build and Push to Docker Hub

Build the image locally and push it to your own Docker Hub account (replace `YOUR_DOCKERHUB_USERNAME` with your actual username):

```bash
docker build -t YOUR_DOCKERHUB_USERNAME/mochi-1:latest .
docker push YOUR_DOCKERHUB_USERNAME/mochi-1:latest
```

Then use `YOUR_DOCKERHUB_USERNAME/mochi-1:latest` as your Docker image in Clore.ai.

{% hint style="info" %}
There is no official pre-built Docker image for Mochi-1 on Docker Hub. You need to build from the Dockerfile above. Alternatively, use `pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel` as the base image directly and run the setup commands manually via SSH.
{% endhint %}

***

## Step 3 — Connect via SSH

Once your instance is running:

```bash
ssh root@<clore-host> -p <assigned-ssh-port>
```

***

## Step 4 — Download Mochi-1 Weights

The model weights are hosted on Hugging Face. Download them via the `huggingface_hub` CLI:

```bash
cd /workspace

# Install huggingface-cli if not present
pip install -U huggingface_hub

# Download Mochi-1 weights (~40 GB for bf16)
huggingface-cli download genmo/mochi-1-preview \
    --local-dir /workspace/mochi-weights \
    --include "*.safetensors" "*.json" "*.txt"
```

{% hint style="info" %}
The full bf16 model is approximately 80 GB. The `fp8` quantized version is \~40 GB and runs on RTX 4090 (24 GB) with CPU offloading. Specify `--include "*fp8*"` to download only quantized weights.
{% endhint %}

### Alternative: Download Only fp8 Quantized Weights

```bash
huggingface-cli download genmo/mochi-1-preview \
    --local-dir /workspace/mochi-weights \
    --include "*fp8*" "*.json" "*.txt"
```

***

## Step 5 — Launch the Gradio Demo

Mochi-1 ships with a Gradio web UI for easy text-to-video generation:

```bash
cd /workspace/mochi

python demos/gradio_ui.py \
    --model_dir /workspace/mochi-weights \
    --share False \
    --host 0.0.0.0 \
    --port 7860
```

**For low-VRAM mode (RTX 4090, 24 GB):**

```bash
python demos/gradio_ui.py \
    --model_dir /workspace/mochi-weights \
    --cpu_offload \
    --share False \
    --host 0.0.0.0 \
    --port 7860
```

{% hint style="info" %}
The `--cpu_offload` flag moves model layers to CPU RAM when not in use, reducing peak VRAM to \~18–20 GB at the cost of \~2× slower generation.
{% endhint %}

***

## Step 6 — Access the Web UI

Open your browser and navigate to:

```
http://<clore-host>:<public-port-7860>
```

You will see the Mochi-1 Gradio interface with:

* A text prompt input
* Generation settings (steps, guidance scale, seed)
* Video output player

***

## Step 7 — Generate Your First Video

### Example Prompts

**Nature scene:**

```
A majestic waterfall cascading down mossy rocks in a lush rainforest, 
golden hour sunlight filtering through the canopy, slow cinematic pan
```

**Action scene:**

```
A cheetah sprinting across an open savanna at full speed, 
dust kicking up behind it, dramatic wide shot, 4K wildlife documentary
```

**Abstract/artistic:**

```
Colorful paint swirling in water in extreme slow motion, 
vivid blue and orange pigments mixing, macro lens, studio lighting
```

### Recommended Settings

| Parameter      | Value                 |
| -------------- | --------------------- |
| Steps          | 64                    |
| Guidance Scale | 4.5                   |
| Duration       | 5.1 seconds (default) |
| Resolution     | 848×480 (native)      |

{% hint style="info" %}
Generation time varies significantly by GPU. On an A100 80 GB, a 5-second video takes approximately **2–4 minutes**. On RTX 4090 with CPU offload, expect **8–15 minutes**.
{% endhint %}

***

## Python API Usage

For programmatic generation, use the diffusers pipeline:

```python
import torch
from diffusers import MochiPipeline
from diffusers.utils import export_to_video

# Load pipeline
pipe = MochiPipeline.from_pretrained(
    "/workspace/mochi-weights",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

# Generate video
with torch.autocast("cuda", torch.bfloat16, cache_enabled=False):
    frames = pipe(
        prompt="A golden retriever playing fetch on a sunny beach, cinematic",
        num_frames=84,
        guidance_scale=4.5,
        num_inference_steps=64,
        generator=torch.Generator("cuda").manual_seed(42)
    ).frames[0]

# Export
export_to_video(frames, "output.mp4", fps=30)
print("Video saved to output.mp4")
```

### Batch Generation Script

```python
import torch
from diffusers import MochiPipeline
from diffusers.utils import export_to_video
import os

pipe = MochiPipeline.from_pretrained(
    "/workspace/mochi-weights",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

prompts = [
    "A butterfly landing on a flower in slow motion, macro photography",
    "Ocean waves crashing against rocky cliffs at sunset, drone shot",
    "Northern lights dancing across a starry sky over a frozen lake",
]

os.makedirs("/workspace/outputs", exist_ok=True)

for i, prompt in enumerate(prompts):
    frames = pipe(
        prompt=prompt,
        num_frames=84,
        guidance_scale=4.5,
        num_inference_steps=64,
    ).frames[0]
    
    output_path = f"/workspace/outputs/video_{i:03d}.mp4"
    export_to_video(frames, output_path, fps=30)
    print(f"Saved: {output_path}")
```

***

## Multi-GPU Inference

For faster generation with multiple GPUs:

```python
import torch
from diffusers import MochiPipeline

# Use device_map for automatic multi-GPU distribution
pipe = MochiPipeline.from_pretrained(
    "/workspace/mochi-weights",
    torch_dtype=torch.bfloat16,
    device_map="balanced"
)

# No need for cpu_offload with multiple GPUs
frames = pipe(
    prompt="Your prompt here",
    num_frames=84,
    guidance_scale=4.5,
    num_inference_steps=64,
).frames[0]
```

{% hint style="info" %}
Clore.ai offers multi-GPU servers (2×, 4× RTX 4090 or A100). With 2× A100 80 GB, generation time drops to under 60 seconds for a 5-second clip.
{% endhint %}

***

## Troubleshooting

### CUDA Out of Memory

```
torch.cuda.OutOfMemoryError: CUDA out of memory
```

**Solutions:**

1. Add `--cpu_offload` to the gradio command
2. Enable VAE slicing: `pipe.enable_vae_slicing()`
3. Reduce `num_frames` (try 24 instead of 84)
4. Use fp8 quantized weights instead of bf16

### Model Loading Slow

**Solution:** Ensure weights are on a fast NVMe drive, not HDD. Check storage speed:

```bash
dd if=/dev/zero of=/workspace/test bs=1M count=1000 conv=fdatasync
```

### Video Artifacts / Temporal Flickering

**Solutions:**

* Increase inference steps (try 80–100)
* Adjust guidance scale (3.5–5.0 range is usually best)
* Use a specific seed for reproducibility and iteration

### Port 7860 Not Accessible

Check that the port was correctly opened in Clore.ai and the Gradio server is binding to `0.0.0.0`:

```bash
ss -tlnp | grep 7860
```

***

## Cost Estimation

| GPU          | VRAM   | Est. Price | 5s video time |
| ------------ | ------ | ---------- | ------------- |
| RTX 4090     | 24 GB  | \~$0.35/hr | \~10–15 min   |
| A100 40GB    | 40 GB  | \~$0.70/hr | \~3–5 min     |
| A100 80GB    | 80 GB  | \~$1.20/hr | \~2–3 min     |
| 2× A100 80GB | 160 GB | \~$2.20/hr | \~60–90 sec   |

***

## Clore.ai GPU Recommendations

Mochi-1 is VRAM-hungry — the 10B parameter model requires careful GPU selection.

| GPU          | VRAM   | Clore.ai Price | Mode                     | 5s Video Generation Time |
| ------------ | ------ | -------------- | ------------------------ | ------------------------ |
| RTX 4090     | 24 GB  | \~$0.70/hr     | fp8 quantized only       | \~10–15 min              |
| A100 40GB    | 40 GB  | \~$1.20/hr     | bf16 recommended         | \~3–5 min                |
| A100 80GB    | 80 GB  | \~$2.00/hr     | full bf16, fast          | \~2–3 min                |
| 2× A100 80GB | 160 GB | \~$4.00/hr     | tensor parallel, fastest | \~60–90 sec              |

{% hint style="warning" %}
**RTX 3090 (24GB) is not recommended** — Mochi-1 in fp8 mode needs 24GB minimum and leaves almost no headroom. The RTX 4090 (24GB) works in fp8 but OOMs frequently on longer sequences. Start with A100 40GB for reliable results.
{% endhint %}

**Best value for quality:** A100 40GB at \~$1.20/hr generates a 5-second clip in 3–5 minutes. That's \~$0.08–0.10 per video clip — significantly cheaper than Runway ML ($0.25–0.50/clip) or Pika Labs subscriptions.

***

## Useful Resources

* [Mochi-1 GitHub](https://github.com/genmoai/mochi)
* [Mochi-1 on Hugging Face](https://huggingface.co/genmo/mochi-1-preview)
* [Genmo Blog — Mochi-1 Release](https://www.genmo.ai/blog/mochi-1)
* [Diffusers Mochi Documentation](https://huggingface.co/docs/diffusers/api/pipelines/mochi)
* [Mochi Prompt Guide (Community)](https://github.com/genmoai/mochi/blob/main/README.md)
