Hunyuan World 2.0 (3D World Model)
Deploy Tencent HY-World 2.0 — the first open-source SOTA 3D world model — on Clore.ai GPUs for text/image/video-to-3D scene generation
Released April 15, 2026 — Tencent Hunyuan dropped HY-World 2.0, the first fully open-source SOTA 3D world model. This guide covers WorldMirror 2.0 (the shipped ~1.2B parameter reconstruction component). Sister models HY-Pano 2.0 and WorldStereo 2.0 are flagged "coming soon" in the official repo — see the Roadmap below.
HY-World 2.0 is Tencent's multi-modal world-model framework for reconstructing, generating, and simulating full 3D scenes. Unlike single-object mesh generators, HY-World ingests text, single or multi-view images, or video and emits editable world representations — meshes, 3D Gaussian Splats, point clouds, depth maps, surface normals, and recovered camera parameters — ready to drop into Unity, Unreal, or Blender.
The first public weights cover WorldMirror 2.0 (~1.2B params, BF16) — the reconstruction half of the stack. It runs in ~12–24 GB of VRAM on a single GPU and supports flexible resolution from 50K to 500K pixels, plus FSDP multi-GPU sharding for larger workloads. A Python API (diffusers-style), CLI via torchrun, and a Gradio demo ship out of the box. A ComfyUI node is not official yet — community ports only.
All examples in this guide run on GPU servers rented through the CLORE.AI Marketplace.
Key Specs
Component
WorldMirror 2.0 (shipped); HY-Pano 2.0 + WorldStereo 2.0 coming soon
Parameters
~1.2B (BF16)
Input modalities
Text · single-view image · multi-view images · video
Output
Mesh · 3D Gaussian Splat · point cloud · depth · normals · camera params
VRAM
~12–24 GB single GPU; FSDP for multi-GPU
Resolution range
50K – 500K pixels (flex-res)
License
tencent-hy-world-2.0-community (custom — see below)
Release
2026-04-15
License caveat: HY-World 2.0 ships under a custom community license (License.txt at repo root), not Apache 2.0 or MIT. Commercial use terms differ from Tencent's Hunyuan3D 2.1. Read the full license before shipping anything built on it.
Why HY-World 2.0?
First open-source SOTA world model — no closed competitors in this category
Full scene output, not just meshes — Gaussian Splats + geometry + camera in one pass
Multi-modal inputs — same pipeline handles text, images, and video
FSDP-ready — scale across 2–8 GPUs for high-res or batched inference
Game-engine ready — outputs drop straight into Unity, Unreal, and Blender
Requirements
GPU VRAM
16 GB (RTX 4080 / 3090)
24–80 GB (RTX 4090 / A100 / H100)
System RAM
32 GB
64–128 GB
Disk
80 GB
200 GB
CUDA
12.1
12.4+
Python
3.10
3.10
PyTorch
2.4.0
2.4.0+
Multi-GPU mode requires ≥ 1 input image per GPU. For a single reference image, stick with one GPU and let FSDP kick in only for batched or high-resolution jobs.
Option A — Quickstart with Docker + torchrun
A minimal docker-compose.yml for a Clore.ai container (official Tencent image is not yet published — this uses the PyTorch base and runs the repo install inside):
Run a multi-GPU reconstruction job with FSDP and BF16:
Option B — Manual Python API
Launch the Gradio demo on port 7860:
For multi-GPU Gradio with FSDP:
Clore.ai GPU Recommendations
Single image → scene, dev/preview
RTX 4090
24 GB
BF16 fits comfortably, fast iteration
~$0.5–2/day
Multi-view video reconstruction
A100 40 GB
40 GB
Handles 200K+ px frames without OOM
~$3–5/day
High-res batched (production)
A100 80 GB
80 GB
Full 500K px flex-res, big batches
~$5–8/day
FSDP multi-GPU / research
2–4× H100
160–320 GB
Sharded training-scale workloads
~$15–40/day
Sweet spot on Clore.ai: a single RTX 4090 at ~$0.5–2/day handles everyday WorldMirror inference. Step up to an A100 only when you need >200K-pixel reconstructions or long video inputs.
Use Cases
Game development — turn concept art into rough 3D environments for blockout and greybox
AR/VR content — generate Gaussian Splat scenes playable in Unity/Unreal with near-photographic fidelity
Film and animation previs — reconstruct sets from on-location photos for virtual cinematography
Architectural visualization — convert reference shots or text briefs into editable 3D walkthroughs
Robotics + simulation — synthesize 3D training environments from sparse real-world footage
Roadmap
Tencent has listed the following as "coming soon" in the official repo:
HY-Pano 2.0 — 360° panorama generation (interim: HunyuanWorld 1.0)
WorldStereo 2.0 — world expansion / novel-view synthesis (interim: original WorldStereo)
WorldNav — trajectory planning for scene traversal
Full world-generation pipeline code — the text/image → full world entry point
WorldMirror 2.0 (reconstruction) is the only component with public weights today. Keep an eye on the HF model page for drops.
Troubleshooting
CUDA out of memory on 16 GB GPU
Lower input resolution toward 50K px, or switch to RTX 4090 (24 GB). Enable --enable_bf16
FSDP hangs on launch
Ensure number of input images is ≥ --nproc_per_node. FSDP also needs NCCL + matching CUDA across GPUs
flash-attn install fails
Try prebuilt wheel pip install flash-attn --no-build-isolation on CUDA 12.4; if it still fails, the pipeline runs (slower) without it
Gradio UI not reachable on Clore.ai
Forward port 7860 in the Clore container config, or launch with --share
License questions for commercial use
Read License.txt in the repo — it is tencent-hy-world-2.0-community, not standard OSS
Next Steps
Hunyuan3D 2.1 — Tencent's single-object text/image-to-mesh generator (smaller, Apache-style pipeline, different use case)
TRELLIS 3D — Microsoft's structured 3D asset generator
Gaussian Splatting — render pipeline for the 3DGS outputs HY-World produces
Last updated
Was this helpful?