# Wan 2.2 VBVR (Motion Control)

**Wan 2.2 VBVR** (Video-Based Video Reference) is Alibaba's April 2026 extension to the Wan 2.2 image-to-video foundation model. It adds a powerful new capability: you provide a **reference video clip** to control motion patterns in your generated video, not just a starting image. The result is consistent, controllable animation — the same character, product, or scene follows the motion path from your reference footage.

This guide covers deploying Wan 2.2 VBVR via ComfyUI on a Clore.ai GPU rental.

***

## What Is VBVR (Video-Based Video Reference)?

Traditional image-to-video models take a static image and generate motion from scratch. The motion is guided by your text prompt, but it can be unpredictable — especially for specific gestures, camera moves, or character actions.

**VBVR changes the equation:** you supply:

1. A **starting image** — your subject (character, product, scene)
2. A **reference motion video** — a short clip demonstrating the motion you want
3. A **text prompt** — describing the content and style

The model extracts the motion pattern from the reference video and applies it to your starting image, generating a new video where your subject performs that motion naturally.

### Example Applications

| Input Image            | Reference Video Motion        | Output                    |
| ---------------------- | ----------------------------- | ------------------------- |
| Product photo          | Hand picking up similar item  | Product pick-up animation |
| Character illustration | Actor walking cycle           | Character walking         |
| Fashion model          | Runway walk footage           | Clothing in motion        |
| Building exterior      | Camera pan from drone footage | Cinematic B-roll reveal   |

***

## Model Overview

* **Full name:** Wan 2.2 I2V-A14B with VBVR (Video-Based Video Reference)
* **Released:** April 2026 by Alibaba / Wan-AI team
* **Built on:** Wan 2.2 I2V-A14B (Image-to-Video, 14B params, up to 480p resolution)
* **HuggingFace:** `Wan-AI/Wan2.2-I2V-A14B`
* **VBVR workflow:** distributed via ComfyUI Manager community nodes
* **License:** Apache 2.0

### Variants

| Variant  | VRAM Required | Quality | Speed    |
| -------- | ------------- | ------- | -------- |
| **FP8**  | 16–24 GB      | High    | Fast     |
| **BF16** | 24–40 GB      | Highest | Moderate |

The **FP8 variant** runs on RTX 3090 (24 GB) and can squeeze into 16 GB cards with reduced batch size. The **BF16 variant** delivers the best quality and runs comfortably on an RTX 4090 (24 GB) or A6000 (48 GB).

***

## Hardware Requirements

| GPU        | VRAM  | Variant        | Price on Clore.ai |
| ---------- | ----- | -------------- | ----------------- |
| RTX 3090   | 24 GB | FP8 ✅          | \~$0.30/day       |
| RTX 4090   | 24 GB | FP8 ✅ / BF16 ✅ | \~$0.50/day       |
| A6000 48GB | 48 GB | BF16 ✅         | \~$1.20/day       |
| A100 80GB  | 80 GB | BF16 ✅         | \~$2.50/day       |

For most users, an **RTX 4090 at \~$0.50/day** is the best balance of price and quality, running BF16 at full 480p resolution.

***

## Step-by-Step Setup on Clore.ai

### Step 1: Rent a GPU

Visit [clore.ai/marketplace](https://clore.ai/marketplace):

* **Budget**: RTX 3090 (\~$0.30/day) — FP8 only
* **Recommended**: RTX 4090 (\~$0.50/day) — BF16 quality
* **Premium**: A6000 (\~$1.20/day) — batch processing, high throughput

Use a **ComfyUI Docker image** or the base CUDA image (we'll install ComfyUI manually).

### Step 2: Install ComfyUI

```bash
# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git /workspace/ComfyUI
cd /workspace/ComfyUI

# Install Python dependencies
pip install -r requirements.txt

# Install ComfyUI Manager (for easy node installation)
cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
cd ..
```

### Step 3: Install VBVR Custom Nodes via ComfyUI Manager

Start ComfyUI:

```bash
cd /workspace/ComfyUI
python main.py --listen 0.0.0.0 --port 8188
```

Open `http://YOUR_CLORE_IP:8188` in your browser. Then:

1. Click **Manager** button (top menu)
2. Search for **"Wan 2.2 VBVR"** or **"WanVideo"**
3. Install the **ComfyUI-WanVideo** node pack
4. Restart ComfyUI after installation

Alternatively, install the nodes directly:

```bash
cd /workspace/ComfyUI/custom_nodes
git clone https://github.com/kijai/ComfyUI-WanVideoWrapper.git
pip install -r ComfyUI-WanVideoWrapper/requirements.txt
```

### Step 4: Download Model Checkpoints

```bash
mkdir -p /workspace/ComfyUI/models/wan

# Download Wan 2.2 I2V base model (~28GB)
huggingface-cli download \
  Wan-AI/Wan2.2-I2V-A14B \
  --local-dir /workspace/ComfyUI/models/wan/Wan2.2-I2V-A14B

# Download VBVR-specific motion encoder weights (~2GB)
# Note: VBVR weights are distributed as a ComfyUI-WanVideoWrapper community release.
# Check https://github.com/kijai/ComfyUI-WanVideoWrapper for the current download path.
huggingface-cli download \
  kijai/WanVideo-motion-encoder \
  --local-dir /workspace/ComfyUI/models/wan/vbvr-motion-encoder
```

> **Tip:** Use `huggingface-cli download --include "*.safetensors"` to skip non-essential files and save disk space.

### Step 5: Download VAE and Text Encoder

```bash
# CLIP text encoder (shared with base Wan 2.2)
huggingface-cli download \
  Wan-AI/Wan2.2-I2V-A14B \
  --include "xlabs_clip*" \
  --local-dir /workspace/ComfyUI/models/clip

# T5 XXL text encoder
huggingface-cli download \
  Wan-AI/Wan2.2-I2V-A14B \
  --include "t5*" \
  --local-dir /workspace/ComfyUI/models/t5

# VAE
huggingface-cli download \
  Wan-AI/Wan2.2-I2V-A14B \
  --include "Wan2.2_VAE.safetensors" \
  --local-dir /workspace/ComfyUI/models/vae
```

***

## Building the VBVR Workflow in ComfyUI

### Workflow Overview

The VBVR workflow connects these node groups:

```
[Load Image] ──────────────────────────────────┐
                                               ↓
[Load Reference Video] → [VBVR Motion Encoder] → [Wan I2V Sampler] → [VAE Decode] → [Save Video]
                                               ↑
[CLIP Text Encode] ────────────────────────────┘
```

### Loading the Workflow

1. Download the pre-built VBVR workflow JSON from the ComfyUI-WanVideoWrapper repository:

   ```
   custom_nodes/ComfyUI-WanVideoWrapper/workflows/wan22_vbvr.json
   ```
2. In ComfyUI: **Load** → select `wan22_vbvr.json`

### Configuring Key Nodes

**WanVideoModelLoader**

* `model_path`: point to `Wan2.2-I2V-A14B`
* `precision`: `fp8_e4m3fn` for RTX 3090, `bf16` for RTX 4090+

**VBVRMotionEncoderLoader**

* `encoder_path`: point to `vbvr-motion-encoder`

**WanVideoSampler**

* `steps`: 25–30 (quality), 15–20 (speed)
* `cfg`: 6.0–7.5 (higher = more prompt-adherent)
* `motion_strength`: 0.6–0.9 (how closely to follow reference motion)
* `frames`: 25 (approx. 2 seconds at 12fps) or 49 (4 seconds)
* `resolution`: 832×480 (default 480p)

**LoadVideo (Reference)**

* Load your reference motion clip (MP4, GIF, or image sequence)
* Recommended: 2–5 seconds, same approximate duration as your target output

***

## Running Your First Generation

### Prepare Your Inputs

1. **Starting image**: 832×480px or close to it. PNG or JPG. This is your subject.
2. **Reference motion video**: ideally 2–5 seconds, shows the motion you want. Resolution doesn't need to match — the model extracts motion vectors, not pixel content.
3. **Text prompt**: describe your subject and what it's doing (e.g., `"a product bottle rotating smoothly on a white surface, cinematic lighting, 4K, professional photography"`)

### Recommended Settings for First Run

```yaml
steps: 25
cfg: 7.0
motion_strength: 0.75
frames: 25
seed: 42 (fixed for reproducibility)
```

### Generation Time Estimates

| GPU       | Variant | Frames    | Time          |
| --------- | ------- | --------- | ------------- |
| RTX 3090  | FP8     | 25 frames | \~3–5 min     |
| RTX 4090  | BF16    | 25 frames | \~2–4 min     |
| RTX 4090  | FP8     | 25 frames | \~1.5–2.5 min |
| A100 80GB | BF16    | 49 frames | \~3–5 min     |

***

## Practical Workflows

### Character Animation

1. **Image**: character illustration or photo
2. **Reference**: footage of an actor performing the desired action (walk, wave, run)
3. **Prompt**: `"cartoon character walking through a forest, smooth animation, consistent style"`
4. **motion\_strength**: 0.85 (high fidelity to reference motion)

### Product Demo

1. **Image**: clean product shot on white background
2. **Reference**: hand unboxing or rotating a similar product
3. **Prompt**: `"premium product reveal, 360 rotation, soft studio lighting, commercial quality"`
4. **motion\_strength**: 0.70 (some creative freedom for lighting/environment)

### Cinematic B-Roll

1. **Image**: landscape photo or building exterior
2. **Reference**: drone footage or camera pan from a stock clip
3. **Prompt**: `"aerial cinematic B-roll, golden hour, smooth drone movement, 4K quality"`
4. **motion\_strength**: 0.65 (let the model add naturalistic motion)

***

## Troubleshooting

**Out of memory on RTX 3090 with BF16**

* Switch to FP8 quantization in WanVideoModelLoader
* Reduce frames from 25 to 17
* Disable VAE tiling if enabled

**Motion doesn't match reference video**

* Increase `motion_strength` to 0.85–0.95
* Ensure reference video is trimmed to match your target duration
* Use reference videos with clear, unambiguous motion (avoid camera shake)

**Generated video flickers or has artifacts**

* Increase steps to 30
* Reduce CFG to 6.0
* Use a reference video with consistent lighting

**Slow download / HuggingFace timeout**

* Use `HF_ENDPOINT=https://hf-mirror.com` environment variable for faster downloads from China
* Or download via `aria2c` with multiple connections

***

## What's Next: Wan 2.7

Alibaba's **Wan 2.7** is the next generation of the Wan video model family, featuring:

* **First + last frame generation**: specify both the opening and closing frames
* **Video-to-video editing**: modify existing video with text instructions
* **Subject referencing**: maintain consistent appearance of specific objects/characters across scenes

Wan 2.7 is currently available via Together AI's API. **Open-source weights are expected mid-Q2 2026.** A full self-hosting guide will be added to this repository when the weights are released.

***

## Summary

Wan 2.2 VBVR brings reference-driven motion control to open-source video generation. Supply a starting image and a reference motion clip, and the model generates a consistent video where your subject follows that motion naturally. FP8 runs on a 24 GB RTX 3090 for \~$0.30/day; BF16 on an RTX 4090 for \~$0.50/day — both on Clore.ai.

**→** [**Rent a GPU on Clore.ai**](https://clore.ai/marketplace) and start generating motion-controlled videos today.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/video-generation/wan22-vbvr.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
