# Hunyuan World 2.0 (3D World Model)

{% hint style="info" %}
**Released April 15, 2026** — Tencent Hunyuan dropped **HY-World 2.0**, the first fully open-source SOTA 3D world model. This guide covers **WorldMirror 2.0** (the shipped \~1.2B parameter reconstruction component). Sister models **HY-Pano 2.0** and **WorldStereo 2.0** are flagged "coming soon" in the official repo — see the [Roadmap](#roadmap) below.
{% endhint %}

HY-World 2.0 is Tencent's multi-modal world-model framework for **reconstructing, generating, and simulating full 3D scenes**. Unlike single-object mesh generators, HY-World ingests text, single or multi-view images, or video and emits editable world representations — meshes, 3D Gaussian Splats, point clouds, depth maps, surface normals, and recovered camera parameters — ready to drop into Unity, Unreal, or Blender.

The first public weights cover **WorldMirror 2.0** (\~1.2B params, BF16) — the reconstruction half of the stack. It runs in \~12–24 GB of VRAM on a single GPU and supports flexible resolution from 50K to 500K pixels, plus FSDP multi-GPU sharding for larger workloads. A Python API (`diffusers`-style), CLI via `torchrun`, and a Gradio demo ship out of the box. A ComfyUI node is **not** official yet — community ports only.

{% hint style="success" %}
All examples in this guide run on GPU servers rented through the [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

### Key Specs

| Property         | Value                                                                    |
| ---------------- | ------------------------------------------------------------------------ |
| Component        | WorldMirror 2.0 (shipped); HY-Pano 2.0 + WorldStereo 2.0 coming soon     |
| Parameters       | \~1.2B (BF16)                                                            |
| Input modalities | Text · single-view image · multi-view images · video                     |
| Output           | Mesh · 3D Gaussian Splat · point cloud · depth · normals · camera params |
| VRAM             | \~12–24 GB single GPU; FSDP for multi-GPU                                |
| Resolution range | 50K – 500K pixels (flex-res)                                             |
| License          | `tencent-hy-world-2.0-community` (custom — see below)                    |
| Release          | 2026-04-15                                                               |

{% hint style="warning" %}
**License caveat:** HY-World 2.0 ships under a custom community license (`License.txt` at repo root), **not** Apache 2.0 or MIT. Commercial use terms differ from Tencent's Hunyuan3D 2.1. Read the full license before shipping anything built on it.
{% endhint %}

### Why HY-World 2.0?

* **First open-source SOTA world model** — no closed competitors in this category
* **Full scene output, not just meshes** — Gaussian Splats + geometry + camera in one pass
* **Multi-modal inputs** — same pipeline handles text, images, and video
* **FSDP-ready** — scale across 2–8 GPUs for high-res or batched inference
* **Game-engine ready** — outputs drop straight into Unity, Unreal, and Blender

***

## Requirements

| Component  | Minimum                 | Recommended                       |
| ---------- | ----------------------- | --------------------------------- |
| GPU VRAM   | 16 GB (RTX 4080 / 3090) | 24–80 GB (RTX 4090 / A100 / H100) |
| System RAM | 32 GB                   | 64–128 GB                         |
| Disk       | 80 GB                   | 200 GB                            |
| CUDA       | 12.1                    | 12.4+                             |
| Python     | 3.10                    | 3.10                              |
| PyTorch    | 2.4.0                   | 2.4.0+                            |

{% hint style="info" %}
Multi-GPU mode requires **≥ 1 input image per GPU**. For a single reference image, stick with one GPU and let FSDP kick in only for batched or high-resolution jobs.
{% endhint %}

***

## Option A — Quickstart with Docker + torchrun

A minimal `docker-compose.yml` for a Clore.ai container (official Tencent image is not yet published — this uses the PyTorch base and runs the repo install inside):

```yaml
version: "3.8"
services:
  hyworld2:
    image: pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel
    ports:
      - "7860:7860"
    volumes:
      - ./workspace:/workspace
      - hf_cache:/root/.cache/huggingface
    working_dir: /workspace
    command: >
      bash -c "
        git clone https://github.com/Tencent-Hunyuan/HY-World-2.0 &&
        cd HY-World-2.0 &&
        pip install -r requirements.txt &&
        pip install flash-attn --no-build-isolation &&
        python -m hyworld2.worldrecon.gradio_app
      "
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: "16gb"

volumes:
  hf_cache:
```

Run a multi-GPU reconstruction job with FSDP and BF16:

```bash
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.pipeline \
    --input_path /workspace/input_images \
    --use_fsdp --enable_bf16
```

***

## Option B — Manual Python API

```bash
# Clone and install
git clone https://github.com/Tencent-Hunyuan/HY-World-2.0
cd HY-World-2.0
conda create -n hyworld2 python=3.10 -y
conda activate hyworld2
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
```

```python
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline

# Loads ~1.2B BF16 weights from HF (tencent/HY-World-2.0)
pipeline = WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')

# Reconstruct a 3D scene from a folder of multi-view images
result = pipeline('path/to/images')

# Optional: inject prior camera + depth for tighter reconstruction
result = pipeline(
    'path/to/images',
    prior_cam_path='path/to/prior_camera.json',
    prior_depth_path='path/to/prior_depth/',
)
```

Launch the Gradio demo on port 7860:

```bash
python -m hyworld2.worldrecon.gradio_app
```

For multi-GPU Gradio with FSDP:

```bash
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
    --use_fsdp --enable_bf16
```

***

## Clore.ai GPU Recommendations

| Workload                          | GPU        | VRAM       | Why                                   | Clore.ai Cost |
| --------------------------------- | ---------- | ---------- | ------------------------------------- | ------------- |
| Single image → scene, dev/preview | RTX 4090   | 24 GB      | BF16 fits comfortably, fast iteration | \~$0.5–2/day  |
| Multi-view video reconstruction   | A100 40 GB | 40 GB      | Handles 200K+ px frames without OOM   | \~$3–5/day    |
| High-res batched (production)     | A100 80 GB | 80 GB      | Full 500K px flex-res, big batches    | \~$5–8/day    |
| FSDP multi-GPU / research         | 2–4× H100  | 160–320 GB | Sharded training-scale workloads      | \~$15–40/day  |

{% hint style="success" %}
**Sweet spot on Clore.ai:** a single **RTX 4090 at \~$0.5–2/day** handles everyday WorldMirror inference. Step up to an A100 only when you need >200K-pixel reconstructions or long video inputs.
{% endhint %}

***

## Use Cases

* **Game development** — turn concept art into rough 3D environments for blockout and greybox
* **AR/VR content** — generate Gaussian Splat scenes playable in Unity/Unreal with near-photographic fidelity
* **Film and animation previs** — reconstruct sets from on-location photos for virtual cinematography
* **Architectural visualization** — convert reference shots or text briefs into editable 3D walkthroughs
* **Robotics + simulation** — synthesize 3D training environments from sparse real-world footage

***

## Roadmap

Tencent has listed the following as "coming soon" in the official repo:

* **HY-Pano 2.0** — 360° panorama generation (interim: HunyuanWorld 1.0)
* **WorldStereo 2.0** — world expansion / novel-view synthesis (interim: original WorldStereo)
* **WorldNav** — trajectory planning for scene traversal
* **Full world-generation pipeline code** — the text/image → full world entry point

WorldMirror 2.0 (reconstruction) is the only component with public weights today. Keep an eye on the [HF model page](https://huggingface.co/tencent/HY-World-2.0) for drops.

***

## Troubleshooting

| Problem                              | Solution                                                                                                                                |
| ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------- |
| `CUDA out of memory` on 16 GB GPU    | Lower input resolution toward 50K px, or switch to RTX 4090 (24 GB). Enable `--enable_bf16`                                             |
| FSDP hangs on launch                 | Ensure number of input images is **≥** `--nproc_per_node`. FSDP also needs NCCL + matching CUDA across GPUs                             |
| `flash-attn` install fails           | Try prebuilt wheel `pip install flash-attn --no-build-isolation` on CUDA 12.4; if it still fails, the pipeline runs (slower) without it |
| Gradio UI not reachable on Clore.ai  | Forward port 7860 in the Clore container config, or launch with `--share`                                                               |
| License questions for commercial use | Read `License.txt` in the repo — it is `tencent-hy-world-2.0-community`, not standard OSS                                               |

***

## Next Steps

* [Hunyuan3D 2.1](https://docs.clore.ai/guides/3d-generation/hunyuan3d) — Tencent's single-object text/image-to-mesh generator (smaller, Apache-style pipeline, different use case)
* [TRELLIS 3D](https://docs.clore.ai/guides/3d-generation/trellis-3d) — Microsoft's structured 3D asset generator
* [Gaussian Splatting](https://docs.clore.ai/guides/3d-generation/gaussian-splatting) — render pipeline for the 3DGS outputs HY-World produces
* [HuggingFace model](https://huggingface.co/tencent/HY-World-2.0)
* [GitHub repo](https://github.com/Tencent-Hunyuan/HY-World-2.0)
* [CLORE.AI Marketplace](https://clore.ai/marketplace)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/3d-generation/hunyuan-world-2.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
