For the complete documentation index, see llms.txt. This page is also available as Markdown.

CubeComposer 4K 360° Video

CubeComposer (CVPR 2026) is a spatio-temporal autoregressive diffusion model that generates native 4K 360° panoramic video from standard perspective video input. Built on the Wan video foundation model, trained on 11,832 high-resolution clips. This is the first open model capable of native 4K 360° generation — enabling VR content creation, virtual tours, and immersive media on consumer GPU hardware.

Why This Matters

360° video has traditionally required specialized capture rigs (multiple cameras, stitching software, expensive post-processing). CubeComposer changes this:

  • Input: any standard camera video (single-lens, phone camera, dashcam)

  • Output: native 4K 360° equirectangular video

  • Method: decomposes panoramas into cubemap faces, generates each face autoregressively with spatial consistency

  • Quality: significantly outperforms previous stitching and outpainting approaches

Hardware Requirements

Config
VRAM
Resolution
Speed

RTX 4090 24GB

24GB

4K 360° (30 frames)

~8 min/clip

RTX 5090 32GB

32GB

4K 360° (60 frames)

~6 min/clip

2× RTX 4090

48GB

4K 360° (120 frames)

~9 min/clip

A100 80GB

80GB

4K 360° (240 frames)

~12 min/clip

Minimum: RTX 4090 24GB (or equivalent 24GB+ VRAM GPU)

On Clore.ai: RTX 4090 from ~$1.20/hr spot — a 2-minute clip costs ~$0.40.

Installation

Quick Start

CLI: Perspective Video → 4K 360°

Python API

Gradio WebUI

Deploy on Clore.ai: Step-by-Step

1. Rent an RTX 4090

  1. Filter: GPU with 24GB+ VRAM (RTX 4090 recommended)

  2. Spot price: ~$1.20–2.50/hr depending on availability

  3. Select Custom Docker or Ubuntu image

2. Setup via SSH

3. Access the UI

Open http://<server-ip>:7860 in your browser to use the Gradio interface.

Workflow: Phone Video → VR-Ready 4K 360°

Spectrum Integration: 4.79× Speedup on Wan2.1

The Spectrum accelerator (CVPR 2026) — a training-free spectral diffusion feature forecaster using Chebyshev polynomials — can be applied to CubeComposer's underlying Wan2.1 base for significant speedups:

Quality Tips

  1. Input video quality matters — higher resolution input = better 360° output

  2. Stable footage — handheld shake reduces consistency across cubemap faces

  3. Good lighting — avoid extreme contrast (overexposed sky + dark interior)

  4. Longer clips — 30+ frames gives better temporal consistency

  5. Face resolution--cubemap_size 1024 is the sweet spot (2048 for critical work, costs 4× more VRAM)

Use Cases

  • VR content creation — convert any footage for Meta Quest, Apple Vision Pro

  • Virtual property tours — turn walkthrough videos into 360° tours

  • Travel content — share immersive travel experiences

  • Architecture visualization — 360° interior/exterior walkthroughs

  • Event documentation — convert event recordings to immersive replays

  • Gaming assets — generate 360° environment references

Cost Estimate for Production Workflow

Task
Clore.ai Cost

5-second clip (30 frames, 4K)

~$0.30 (RTX 4090 spot)

10-second clip (60 frames, 4K)

~$0.50

30-second clip (180 frames, 4K)

~$1.20

Batch: 100 clips (5s each)

~$30


Last updated: March 16, 2026 | Paper: arXiv:2603.04291 (CVPR 2026) | Based on Wan2.1 foundation model

Last updated

Was this helpful?