# HunyuanImage 3.0

HunyuanImage 3.0 by Tencent is the **world's largest open-source image generation model** with 80B total parameters (13B active during inference). Released on January 26, 2026, it breaks the mold by unifying image generation, editing, and understanding into a single autoregressive model — no more separate pipelines for text-to-image and image-to-image. It generates photorealistic images, performs precise element-preserving edits, handles style transfers, and even does multi-image fusion, all from one model.

**HuggingFace:** [tencent/HunyuanImage-3.0-Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct) **GitHub:** [Tencent-Hunyuan/HunyuanImage-3.0](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0) **License:** Tencent Hunyuan Community License (free for research & commercial use under 100M MAU)

## Key Features

* **80B total / 13B active parameters** — largest open-source image MoE model; activates only 13B params per inference
* **Unified multimodal architecture** — text-to-image, image editing, style transfer, and multi-image composition in one model
* **Instruction-driven editing** — describe what you want changed in natural language, preserving untouched elements
* **Distilled checkpoint available** — `HunyuanImage-3.0-Instruct-Distil` runs in just 8 sampling steps for faster generation
* **vLLM acceleration** — native vLLM support for significantly faster inference in production
* **Autoregressive framework** — unlike DiT-based models (FLUX, SD3.5), uses a unified AR approach for both understanding and generation

## Model Variants

| Model                                | Use Case                              | Steps | HuggingFace                                |
| ------------------------------------ | ------------------------------------- | ----- | ------------------------------------------ |
| **HunyuanImage-3.0**                 | Text-to-image only                    | 30–50 | `tencent/HunyuanImage-3.0`                 |
| **HunyuanImage-3.0-Instruct**        | Text-to-image + editing + multi-image | 30–50 | `tencent/HunyuanImage-3.0-Instruct`        |
| **HunyuanImage-3.0-Instruct-Distil** | Fast inference (8 steps)              | 8     | `tencent/HunyuanImage-3.0-Instruct-Distil` |

## Requirements

| Configuration | Single GPU (offloading)   | Recommended  | Multi-GPU Production |
| ------------- | ------------------------- | ------------ | -------------------- |
| GPU           | 1× RTX 4090 24GB          | 1× A100 80GB | 2–3× A100 80GB       |
| VRAM          | 24GB (with layer offload) | 80GB         | 160–240GB            |
| RAM           | 128GB                     | 128GB        | 256GB                |
| Disk          | 200GB                     | 200GB        | 200GB                |
| CUDA          | 12.0+                     | 12.0+        | 12.0+                |

**Recommended Clore.ai setup:**

* **Best value:** 1× A100 80GB (\~$2–4/day) — runs the full model comfortably without offloading
* **Budget option:** 1× RTX 4090 (\~$0.5–2/day) — works with CPU offloading (slower, but functional)
* **Fast production:** 2× A100 80GB (\~$4–8/day) — for batch generation and the Instruct model

## Quick Start

### Installation

```bash
# Clone the repository
git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
cd HunyuanImage-3.0

# Create environment
pip install -r requirements.txt
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Download model weights
huggingface-cli download tencent/HunyuanImage-3.0-Instruct --local-dir ./ckpts/HunyuanImage-3-Instruct
```

### Text-to-Image with Transformers

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model (requires ~80GB VRAM for full precision)
model_path = "./ckpts/HunyuanImage-3-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Generate an image from text
prompt = "A serene Japanese garden in autumn, koi fish swimming in a crystal-clear pond, golden maple leaves falling, watercolor painting style"
output = model.generate_image(prompt, num_inference_steps=30)
output.save("japanese_garden.png")
```

### Using the Gradio Web Interface

The easiest way to experiment with all features:

```bash
cd HunyuanImage-3.0

# Install Gradio
pip install gradio

# Launch the web interface
python gradio_demo.py \
    --model-path ./ckpts/HunyuanImage-3-Instruct \
    --server-name 0.0.0.0 \
    --server-port 7860
```

Then access via SSH tunnel: `ssh -L 7860:localhost:7860 root@<clore-ip>`

## Usage Examples

### 1. Text-to-Image Generation (CLI)

```bash
cd HunyuanImage-3.0

python inference.py \
    --model-path ./ckpts/HunyuanImage-3-Instruct \
    --prompt "Cyberpunk cityscape at night, neon-lit skyscrapers reflected in rain-soaked streets, flying cars, volumetric fog, 8K" \
    --output-path output.png \
    --num-inference-steps 30 \
    --guidance-scale 5.0
```

### 2. Image Editing with Natural Language

One of HunyuanImage 3.0's standout features — edit existing images by describing changes:

```bash
python inference.py \
    --model-path ./ckpts/HunyuanImage-3-Instruct \
    --prompt "Change the season to winter with snow covering the trees" \
    --image-path input_photo.jpg \
    --output-path edited_winter.png \
    --num-inference-steps 30
```

### 3. Fast Generation with Distilled Model (8 Steps)

```bash
# Download distilled checkpoint
huggingface-cli download tencent/HunyuanImage-3.0-Instruct-Distil \
    --local-dir ./ckpts/HunyuanImage-3-Instruct-Distil

# Generate with only 8 steps (5-6× faster)
python inference.py \
    --model-path ./ckpts/HunyuanImage-3-Instruct-Distil \
    --prompt "Portrait of an astronaut riding a horse on Mars, photorealistic" \
    --output-path astronaut.png \
    --num-inference-steps 8
```

## Comparison with Other Image Models

| Feature            | HunyuanImage 3.0     | FLUX.2 Klein          | SD 3.5 Large          |
| ------------------ | -------------------- | --------------------- | --------------------- |
| Parameters         | 80B MoE (13B active) | 32B DiT               | 8B DiT                |
| Architecture       | Autoregressive MoE   | Diffusion Transformer | Diffusion Transformer |
| Image Editing      | ✅ Native             | ❌ Requires ControlNet | ❌ Requires img2img    |
| Multi-Image Fusion | ✅ Native             | ❌                     | ❌                     |
| Style Transfer     | ✅ Native             | ❌ Requires LoRA       | ❌ Requires LoRA       |
| Min VRAM           | \~24GB (offloaded)   | 16GB                  | 8GB                   |
| Speed (A100)       | \~15–30 sec          | \~0.3 sec             | \~5 sec               |
| License            | Tencent Community    | Apache 2.0            | Stability AI CL       |

## Tips for Clore.ai Users

1. **Use the Distilled model for speed** — `HunyuanImage-3.0-Instruct-Distil` generates in 8 steps instead of 30–50, cutting inference time by 4–6×. Quality remains surprisingly close to the full model.
2. **A100 80GB is the sweet spot** — A single A100 80GB (\~$2–4/day on Clore.ai) runs the Instruct model without any offloading tricks. This is much faster than an RTX 4090 with CPU offloading.
3. **Pre-download models** — The full Instruct checkpoint is \~160GB. Download it once to a persistent Clore.ai volume to avoid re-downloading every time you spin up a new instance.
4. **Use SSH tunneling for Gradio** — Don't expose port 7860 publicly. Use `ssh -L 7860:localhost:7860` to access the web interface securely from your browser.
5. **Try the vLLM backend for batch work** — If you're generating many images, the vLLM inference path (in the `vllm_infer/` folder) provides significantly better throughput.

## Troubleshooting

| Issue                              | Solution                                                                                                                           |
| ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `CUDA out of memory` on RTX 4090   | Use `device_map="auto"` to enable CPU offloading, or switch to the Distil model                                                    |
| Download fails / very slow         | Set `HF_TOKEN` env variable; use `huggingface-cli download` with `--resume-download`                                               |
| Cannot load model via HF model ID  | Due to the dot in the name, clone locally first: `huggingface-cli download tencent/HunyuanImage-3.0-Instruct --local-dir ./ckpts/` |
| Blurry or low-quality outputs      | Increase `--num-inference-steps` to 40–50; increase `--guidance-scale` to 7.0                                                      |
| Image editing ignores instructions | Be specific about what to change and what to preserve; use short, clear prompts                                                    |
| Gradio interface won't start       | Ensure `gradio>=4.0` is installed; check that the model path points to the correct directory                                       |

## Further Reading

* [GitHub Repository](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0) — Official code, inference scripts, Gradio demo
* [HunyuanImage 3.0-Instruct (HuggingFace)](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct) — Full model weights
* [Distilled Checkpoint](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil) — 8-step fast inference
* [Technical Report (arXiv)](https://arxiv.org/pdf/2509.23951) — Architecture details and benchmarks
* [ComfyUI Integration](https://github.com/bgreene2/ComfyUI-Hunyuan-Image-3) — Community ComfyUI custom node


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/image-generation/hunyuan-image3.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
