# CubeComposer 4K 360° Video

> **CubeComposer** (CVPR 2026) is a spatio-temporal autoregressive diffusion model that generates **native 4K 360° panoramic video** from standard perspective video input. Built on the Wan video foundation model, trained on 11,832 high-resolution clips. This is the first open model capable of native 4K 360° generation — enabling VR content creation, virtual tours, and immersive media on consumer GPU hardware.

## Why This Matters

360° video has traditionally required specialized capture rigs (multiple cameras, stitching software, expensive post-processing). CubeComposer changes this:

* **Input**: any standard camera video (single-lens, phone camera, dashcam)
* **Output**: native 4K 360° equirectangular video
* **Method**: decomposes panoramas into cubemap faces, generates each face autoregressively with spatial consistency
* **Quality**: significantly outperforms previous stitching and outpainting approaches

## Hardware Requirements

| Config        | VRAM | Resolution           | Speed         |
| ------------- | ---- | -------------------- | ------------- |
| RTX 4090 24GB | 24GB | 4K 360° (30 frames)  | \~8 min/clip  |
| RTX 5090 32GB | 32GB | 4K 360° (60 frames)  | \~6 min/clip  |
| 2× RTX 4090   | 48GB | 4K 360° (120 frames) | \~9 min/clip  |
| A100 80GB     | 80GB | 4K 360° (240 frames) | \~12 min/clip |

**Minimum**: RTX 4090 24GB (or equivalent 24GB+ VRAM GPU)

> On Clore.ai: RTX 4090 from **\~$1.20/hr spot** — a 2-minute clip costs \~$0.40.

## Installation

```bash
# Clone repository
git clone https://github.com/TencentARC/CubeComposer
cd cubecomposer

# Install dependencies (Python 3.10+, CUDA 12.1+)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Download model weights (~18GB)
python scripts/download_weights.py --model cubecomposer-4k-v1
```

### Docker (Recommended for Clore.ai)

```bash
# No official Docker image — install from source:
git clone https://github.com/TencentARC/CubeComposer /workspace/CubeComposer
cd /workspace/CubeComposer
pip install -r requirements.txt
python app.py --share --listen 0.0.0.0 --port 7860
```

## Quick Start

### CLI: Perspective Video → 4K 360°

```bash
# Basic usage: input perspective video, output 4K equirectangular
python generate_360.py \
  --input /workspace/input_video.mp4 \
  --output /workspace/output_360.mp4 \
  --resolution 4096x2048 \
  --frames 30 \
  --fps 30

# Higher quality: more steps, longer clip
python generate_360.py \
  --input /workspace/walk_through_park.mp4 \
  --output /workspace/park_360_4k.mp4 \
  --resolution 4096x2048 \
  --frames 60 \
  --num_inference_steps 50 \
  --guidance_scale 7.5 \
  --fps 30
```

### Python API

```python
from cubecomposer import CubeComposerPipeline
import torch

# Load pipeline
pipe = CubeComposerPipeline.from_pretrained(
    "cubecomposer/cubecomposer-4k-v1",
    torch_dtype=torch.bfloat16
).to("cuda")

# Generate 360° video from perspective input
output = pipe(
    input_video_path="input.mp4",
    num_frames=30,
    resolution=(4096, 2048),  # 4K equirectangular
    num_inference_steps=50,
    guidance_scale=7.5,
    cubemap_size=1024  # size of each cubemap face
)

# Save as standard equirectangular MP4
output.save("output_360.mp4", fps=30)
print(f"Generated 4K 360° video: output_360.mp4")
```

### Gradio WebUI

```python
import gradio as gr
from cubecomposer import CubeComposerPipeline
import torch

pipe = CubeComposerPipeline.from_pretrained(
    "cubecomposer/cubecomposer-4k-v1",
    torch_dtype=torch.bfloat16
).to("cuda")

def generate_360(video, frames, steps):
    output = pipe(
        input_video_path=video,
        num_frames=int(frames),
        resolution=(4096, 2048),
        num_inference_steps=int(steps)
    )
    output.save("/tmp/output_360.mp4", fps=30)
    return "/tmp/output_360.mp4"

demo = gr.Interface(
    fn=generate_360,
    inputs=[
        gr.Video(label="Input Perspective Video"),
        gr.Slider(10, 120, value=30, label="Number of Frames"),
        gr.Slider(20, 80, value=50, label="Inference Steps (quality)")
    ],
    outputs=gr.Video(label="4K 360° Output"),
    title="CubeComposer — 4K 360° Video Generation",
    description="Upload any perspective video → get native 4K 360° panoramic video"
)

demo.launch(server_name="0.0.0.0", server_port=7860, share=True)
```

## Deploy on Clore.ai: Step-by-Step

### 1. Rent an RTX 4090

1. Go to [clore.ai/marketplace](https://clore.ai/marketplace)
2. Filter: GPU with **24GB+ VRAM** (RTX 4090 recommended)
3. Spot price: \~$1.20–2.50/hr depending on availability
4. Select **Custom Docker** or **Ubuntu** image

### 2. Setup via SSH

```bash
# Connect to your Clore server
ssh root@<server-ip>

# One-liner setup
git clone https://github.com/TencentARC/CubeComposer && \
  cd cubecomposer && \
  pip install -r requirements.txt && \
  python scripts/download_weights.py && \
  python app.py --port 7860 --host 0.0.0.0
```

### 3. Access the UI

Open `http://<server-ip>:7860` in your browser to use the Gradio interface.

## Workflow: Phone Video → VR-Ready 4K 360°

```bash
# Step 1: Upload phone video to server
scp ~/my_video.mp4 root@<server-ip>:/workspace/

# Step 2: Generate 4K 360° version
ssh root@<server-ip> "cd cubecomposer && python generate_360.py \
  --input /workspace/my_video.mp4 \
  --output /workspace/my_video_360_4k.mp4 \
  --resolution 4096x2048 --frames 60"

# Step 3: Add 360° metadata for YouTube/VR headsets
ffmpeg -i my_video_360_4k.mp4 \
  -c copy \
  -metadata:s:v:0 spherical=equirectangular \
  my_video_360_4k_vr.mp4

# Step 4: Download result
scp root@<server-ip>:/workspace/my_video_360_4k_vr.mp4 ~/
```

## Spectrum Integration: 4.79× Speedup on Wan2.1

The **Spectrum accelerator** (CVPR 2026) — a training-free spectral diffusion feature forecaster using Chebyshev polynomials — can be applied to CubeComposer's underlying Wan2.1 base for significant speedups:

```python
from cubecomposer import CubeComposerPipeline
from spectrum_accelerator import SpectrumAccelerator
import torch

pipe = CubeComposerPipeline.from_pretrained(
    "cubecomposer/cubecomposer-4k-v1",
    torch_dtype=torch.bfloat16
).to("cuda")

# Apply Spectrum for 4.79× speedup with no quality loss
accelerator = SpectrumAccelerator(pipe.unet, order=8)  # Chebyshev order
pipe.unet = accelerator

# Now generates at ~4.79× the original speed
output = pipe(
    input_video_path="input.mp4",
    num_frames=30,
    resolution=(4096, 2048),
    num_inference_steps=50  # Effective quality of ~240 steps
)
output.save("output_fast_360.mp4")
```

## Quality Tips

1. **Input video quality matters** — higher resolution input = better 360° output
2. **Stable footage** — handheld shake reduces consistency across cubemap faces
3. **Good lighting** — avoid extreme contrast (overexposed sky + dark interior)
4. **Longer clips** — 30+ frames gives better temporal consistency
5. **Face resolution** — `--cubemap_size 1024` is the sweet spot (2048 for critical work, costs 4× more VRAM)

## Use Cases

* **VR content creation** — convert any footage for Meta Quest, Apple Vision Pro
* **Virtual property tours** — turn walkthrough videos into 360° tours
* **Travel content** — share immersive travel experiences
* **Architecture visualization** — 360° interior/exterior walkthroughs
* **Event documentation** — convert event recordings to immersive replays
* **Gaming assets** — generate 360° environment references

## Cost Estimate for Production Workflow

| Task                            | Clore.ai Cost           |
| ------------------------------- | ----------------------- |
| 5-second clip (30 frames, 4K)   | \~$0.30 (RTX 4090 spot) |
| 10-second clip (60 frames, 4K)  | \~$0.50                 |
| 30-second clip (180 frames, 4K) | \~$1.20                 |
| Batch: 100 clips (5s each)      | \~$30                   |

## Related Guides

* [Wan2.1 Video Generation](/guides/video-generation/wan-video.md) — the foundation model under CubeComposer
* [FramePack](/guides/video-generation/framepack.md) — efficient long video generation (6GB VRAM!)
* [LTX-2 Video](/guides/video-generation/ltx-video-2.md) — fast latent video generation
* [ComfyUI](/guides/image-generation/comfyui.md) — node-based workflow for video + image
* [RIFE Video Interpolation](/guides/video-processing/rife-interpolation.md) — smooth out generated video

***

*Last updated: March 16, 2026 | Paper: arXiv:2603.04291 (CVPR 2026) | Based on Wan2.1 foundation model*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/video-generation/cubecomposer-360-video.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
