Wan 2.2 VBVR (Motion Control)
Wan 2.2 VBVR (Video-Based Video Reference) is Alibaba's April 2026 extension to the Wan 2.2 image-to-video foundation model. It adds a powerful new capability: you provide a reference video clip to control motion patterns in your generated video, not just a starting image. The result is consistent, controllable animation — the same character, product, or scene follows the motion path from your reference footage.
This guide covers deploying Wan 2.2 VBVR via ComfyUI on a Clore.ai GPU rental.
What Is VBVR (Video-Based Video Reference)?
Traditional image-to-video models take a static image and generate motion from scratch. The motion is guided by your text prompt, but it can be unpredictable — especially for specific gestures, camera moves, or character actions.
VBVR changes the equation: you supply:
A starting image — your subject (character, product, scene)
A reference motion video — a short clip demonstrating the motion you want
A text prompt — describing the content and style
The model extracts the motion pattern from the reference video and applies it to your starting image, generating a new video where your subject performs that motion naturally.
Example Applications
Product photo
Hand picking up similar item
Product pick-up animation
Character illustration
Actor walking cycle
Character walking
Fashion model
Runway walk footage
Clothing in motion
Building exterior
Camera pan from drone footage
Cinematic B-roll reveal
Model Overview
Full name: Wan 2.2 I2V-A14B with VBVR (Video-Based Video Reference)
Released: April 2026 by Alibaba / Wan-AI team
Built on: Wan 2.2 I2V-A14B (Image-to-Video, 14B params, up to 480p resolution)
HuggingFace:
Wan-AI/Wan2.2-I2V-A14BVBVR workflow: distributed via ComfyUI Manager community nodes
License: Apache 2.0
Variants
FP8
16–24 GB
High
Fast
BF16
24–40 GB
Highest
Moderate
The FP8 variant runs on RTX 3090 (24 GB) and can squeeze into 16 GB cards with reduced batch size. The BF16 variant delivers the best quality and runs comfortably on an RTX 4090 (24 GB) or A6000 (48 GB).
Hardware Requirements
RTX 3090
24 GB
FP8 ✅
~$0.30/day
RTX 4090
24 GB
FP8 ✅ / BF16 ✅
~$0.50/day
A6000 48GB
48 GB
BF16 ✅
~$1.20/day
A100 80GB
80 GB
BF16 ✅
~$2.50/day
For most users, an RTX 4090 at ~$0.50/day is the best balance of price and quality, running BF16 at full 480p resolution.
Step-by-Step Setup on Clore.ai
Step 1: Rent a GPU
Visit clore.ai/marketplace:
Budget: RTX 3090 (~$0.30/day) — FP8 only
Recommended: RTX 4090 (~$0.50/day) — BF16 quality
Premium: A6000 (~$1.20/day) — batch processing, high throughput
Use a ComfyUI Docker image or the base CUDA image (we'll install ComfyUI manually).
Step 2: Install ComfyUI
Step 3: Install VBVR Custom Nodes via ComfyUI Manager
Start ComfyUI:
Open http://YOUR_CLORE_IP:8188 in your browser. Then:
Click Manager button (top menu)
Search for "Wan 2.2 VBVR" or "WanVideo"
Install the ComfyUI-WanVideo node pack
Restart ComfyUI after installation
Alternatively, install the nodes directly:
Step 4: Download Model Checkpoints
Tip: Use
huggingface-cli download --include "*.safetensors"to skip non-essential files and save disk space.
Step 5: Download VAE and Text Encoder
Building the VBVR Workflow in ComfyUI
Workflow Overview
The VBVR workflow connects these node groups:
Loading the Workflow
Download the pre-built VBVR workflow JSON from the ComfyUI-WanVideoWrapper repository:
In ComfyUI: Load → select
wan22_vbvr.json
Configuring Key Nodes
WanVideoModelLoader
model_path: point toWan2.2-I2V-A14Bprecision:fp8_e4m3fnfor RTX 3090,bf16for RTX 4090+
VBVRMotionEncoderLoader
encoder_path: point tovbvr-motion-encoder
WanVideoSampler
steps: 25–30 (quality), 15–20 (speed)cfg: 6.0–7.5 (higher = more prompt-adherent)motion_strength: 0.6–0.9 (how closely to follow reference motion)frames: 25 (approx. 2 seconds at 12fps) or 49 (4 seconds)resolution: 832×480 (default 480p)
LoadVideo (Reference)
Load your reference motion clip (MP4, GIF, or image sequence)
Recommended: 2–5 seconds, same approximate duration as your target output
Running Your First Generation
Prepare Your Inputs
Starting image: 832×480px or close to it. PNG or JPG. This is your subject.
Reference motion video: ideally 2–5 seconds, shows the motion you want. Resolution doesn't need to match — the model extracts motion vectors, not pixel content.
Text prompt: describe your subject and what it's doing (e.g.,
"a product bottle rotating smoothly on a white surface, cinematic lighting, 4K, professional photography")
Recommended Settings for First Run
Generation Time Estimates
RTX 3090
FP8
25 frames
~3–5 min
RTX 4090
BF16
25 frames
~2–4 min
RTX 4090
FP8
25 frames
~1.5–2.5 min
A100 80GB
BF16
49 frames
~3–5 min
Practical Workflows
Character Animation
Image: character illustration or photo
Reference: footage of an actor performing the desired action (walk, wave, run)
Prompt:
"cartoon character walking through a forest, smooth animation, consistent style"motion_strength: 0.85 (high fidelity to reference motion)
Product Demo
Image: clean product shot on white background
Reference: hand unboxing or rotating a similar product
Prompt:
"premium product reveal, 360 rotation, soft studio lighting, commercial quality"motion_strength: 0.70 (some creative freedom for lighting/environment)
Cinematic B-Roll
Image: landscape photo or building exterior
Reference: drone footage or camera pan from a stock clip
Prompt:
"aerial cinematic B-roll, golden hour, smooth drone movement, 4K quality"motion_strength: 0.65 (let the model add naturalistic motion)
Troubleshooting
Out of memory on RTX 3090 with BF16
Switch to FP8 quantization in WanVideoModelLoader
Reduce frames from 25 to 17
Disable VAE tiling if enabled
Motion doesn't match reference video
Increase
motion_strengthto 0.85–0.95Ensure reference video is trimmed to match your target duration
Use reference videos with clear, unambiguous motion (avoid camera shake)
Generated video flickers or has artifacts
Increase steps to 30
Reduce CFG to 6.0
Use a reference video with consistent lighting
Slow download / HuggingFace timeout
Use
HF_ENDPOINT=https://hf-mirror.comenvironment variable for faster downloads from ChinaOr download via
aria2cwith multiple connections
What's Next: Wan 2.7
Alibaba's Wan 2.7 is the next generation of the Wan video model family, featuring:
First + last frame generation: specify both the opening and closing frames
Video-to-video editing: modify existing video with text instructions
Subject referencing: maintain consistent appearance of specific objects/characters across scenes
Wan 2.7 is currently available via Together AI's API. Open-source weights are expected mid-Q2 2026. A full self-hosting guide will be added to this repository when the weights are released.
Summary
Wan 2.2 VBVR brings reference-driven motion control to open-source video generation. Supply a starting image and a reference motion clip, and the model generates a consistent video where your subject follows that motion naturally. FP8 runs on a 24 GB RTX 3090 for ~$0.30/day; BF16 on an RTX 4090 for ~$0.50/day — both on Clore.ai.
→ Rent a GPU on Clore.ai and start generating motion-controlled videos today.
Last updated
Was this helpful?