For the complete documentation index, see llms.txt. This page is also available as Markdown.

Hunyuan World 2.0 (3D World Model)

Deploy Tencent HY-World 2.0 — the first open-source SOTA 3D world model — on Clore.ai GPUs for text/image/video-to-3D scene generation

Released April 15, 2026 — Tencent Hunyuan dropped HY-World 2.0, the first fully open-source SOTA 3D world model. This guide covers WorldMirror 2.0 (the shipped ~1.2B parameter reconstruction component). Sister models HY-Pano 2.0 and WorldStereo 2.0 are flagged "coming soon" in the official repo — see the Roadmap below.

HY-World 2.0 is Tencent's multi-modal world-model framework for reconstructing, generating, and simulating full 3D scenes. Unlike single-object mesh generators, HY-World ingests text, single or multi-view images, or video and emits editable world representations — meshes, 3D Gaussian Splats, point clouds, depth maps, surface normals, and recovered camera parameters — ready to drop into Unity, Unreal, or Blender.

The first public weights cover WorldMirror 2.0 (~1.2B params, BF16) — the reconstruction half of the stack. It runs in ~12–24 GB of VRAM on a single GPU and supports flexible resolution from 50K to 500K pixels, plus FSDP multi-GPU sharding for larger workloads. A Python API (diffusers-style), CLI via torchrun, and a Gradio demo ship out of the box. A ComfyUI node is not official yet — community ports only.

Key Specs

Property
Value

Component

WorldMirror 2.0 (shipped); HY-Pano 2.0 + WorldStereo 2.0 coming soon

Parameters

~1.2B (BF16)

Input modalities

Text · single-view image · multi-view images · video

Output

Mesh · 3D Gaussian Splat · point cloud · depth · normals · camera params

VRAM

~12–24 GB single GPU; FSDP for multi-GPU

Resolution range

50K – 500K pixels (flex-res)

License

tencent-hy-world-2.0-community (custom — see below)

Release

2026-04-15

Why HY-World 2.0?

  • First open-source SOTA world model — no closed competitors in this category

  • Full scene output, not just meshes — Gaussian Splats + geometry + camera in one pass

  • Multi-modal inputs — same pipeline handles text, images, and video

  • FSDP-ready — scale across 2–8 GPUs for high-res or batched inference

  • Game-engine ready — outputs drop straight into Unity, Unreal, and Blender


Requirements

Component
Minimum
Recommended

GPU VRAM

16 GB (RTX 4080 / 3090)

24–80 GB (RTX 4090 / A100 / H100)

System RAM

32 GB

64–128 GB

Disk

80 GB

200 GB

CUDA

12.1

12.4+

Python

3.10

3.10

PyTorch

2.4.0

2.4.0+

Multi-GPU mode requires ≥ 1 input image per GPU. For a single reference image, stick with one GPU and let FSDP kick in only for batched or high-resolution jobs.


Option A — Quickstart with Docker + torchrun

A minimal docker-compose.yml for a Clore.ai container (official Tencent image is not yet published — this uses the PyTorch base and runs the repo install inside):

Run a multi-GPU reconstruction job with FSDP and BF16:


Option B — Manual Python API

Launch the Gradio demo on port 7860:

For multi-GPU Gradio with FSDP:


Clore.ai GPU Recommendations

Workload
GPU
VRAM
Why
Clore.ai Cost

Single image → scene, dev/preview

RTX 4090

24 GB

BF16 fits comfortably, fast iteration

~$0.5–2/day

Multi-view video reconstruction

A100 40 GB

40 GB

Handles 200K+ px frames without OOM

~$3–5/day

High-res batched (production)

A100 80 GB

80 GB

Full 500K px flex-res, big batches

~$5–8/day

FSDP multi-GPU / research

2–4× H100

160–320 GB

Sharded training-scale workloads

~$15–40/day


Use Cases

  • Game development — turn concept art into rough 3D environments for blockout and greybox

  • AR/VR content — generate Gaussian Splat scenes playable in Unity/Unreal with near-photographic fidelity

  • Film and animation previs — reconstruct sets from on-location photos for virtual cinematography

  • Architectural visualization — convert reference shots or text briefs into editable 3D walkthroughs

  • Robotics + simulation — synthesize 3D training environments from sparse real-world footage


Roadmap

Tencent has listed the following as "coming soon" in the official repo:

  • HY-Pano 2.0 — 360° panorama generation (interim: HunyuanWorld 1.0)

  • WorldStereo 2.0 — world expansion / novel-view synthesis (interim: original WorldStereo)

  • WorldNav — trajectory planning for scene traversal

  • Full world-generation pipeline code — the text/image → full world entry point

WorldMirror 2.0 (reconstruction) is the only component with public weights today. Keep an eye on the HF model page for drops.


Troubleshooting

Problem
Solution

CUDA out of memory on 16 GB GPU

Lower input resolution toward 50K px, or switch to RTX 4090 (24 GB). Enable --enable_bf16

FSDP hangs on launch

Ensure number of input images is --nproc_per_node. FSDP also needs NCCL + matching CUDA across GPUs

flash-attn install fails

Try prebuilt wheel pip install flash-attn --no-build-isolation on CUDA 12.4; if it still fails, the pipeline runs (slower) without it

Gradio UI not reachable on Clore.ai

Forward port 7860 in the Clore container config, or launch with --share

License questions for commercial use

Read License.txt in the repo — it is tencent-hy-world-2.0-community, not standard OSS


Next Steps

Last updated

Was this helpful?