HunyuanImage 3.0

Run HunyuanImage 3.0 — Tencent's 80B MoE multimodal image generation and editing model on Clore.ai GPUs

HunyuanImage 3.0 by Tencent is the world's largest open-source image generation model with 80B total parameters (13B active during inference). Released on January 26, 2026, it breaks the mold by unifying image generation, editing, and understanding into a single autoregressive model — no more separate pipelines for text-to-image and image-to-image. It generates photorealistic images, performs precise element-preserving edits, handles style transfers, and even does multi-image fusion, all from one model.

HuggingFace: tencent/HunyuanImage-3.0-Instructarrow-up-right GitHub: Tencent-Hunyuan/HunyuanImage-3.0arrow-up-right License: Tencent Hunyuan Community License (free for research & commercial use under 100M MAU)

Key Features

  • 80B total / 13B active parameters — largest open-source image MoE model; activates only 13B params per inference

  • Unified multimodal architecture — text-to-image, image editing, style transfer, and multi-image composition in one model

  • Instruction-driven editing — describe what you want changed in natural language, preserving untouched elements

  • Distilled checkpoint availableHunyuanImage-3.0-Instruct-Distil runs in just 8 sampling steps for faster generation

  • vLLM acceleration — native vLLM support for significantly faster inference in production

  • Autoregressive framework — unlike DiT-based models (FLUX, SD3.5), uses a unified AR approach for both understanding and generation

Model Variants

Model
Use Case
Steps
HuggingFace

HunyuanImage-3.0

Text-to-image only

30–50

tencent/HunyuanImage-3.0

HunyuanImage-3.0-Instruct

Text-to-image + editing + multi-image

30–50

tencent/HunyuanImage-3.0-Instruct

HunyuanImage-3.0-Instruct-Distil

Fast inference (8 steps)

8

tencent/HunyuanImage-3.0-Instruct-Distil

Requirements

Configuration
Single GPU (offloading)
Recommended
Multi-GPU Production

GPU

1× RTX 4090 24GB

1× A100 80GB

2–3× A100 80GB

VRAM

24GB (with layer offload)

80GB

160–240GB

RAM

128GB

128GB

256GB

Disk

200GB

200GB

200GB

CUDA

12.0+

12.0+

12.0+

Recommended Clore.ai setup:

  • Best value: 1× A100 80GB (~$2–4/day) — runs the full model comfortably without offloading

  • Budget option: 1× RTX 4090 (~$0.5–2/day) — works with CPU offloading (slower, but functional)

  • Fast production: 2× A100 80GB (~$4–8/day) — for batch generation and the Instruct model

Quick Start

Installation

Text-to-Image with Transformers

Using the Gradio Web Interface

The easiest way to experiment with all features:

Then access via SSH tunnel: ssh -L 7860:localhost:7860 root@<clore-ip>

Usage Examples

1. Text-to-Image Generation (CLI)

2. Image Editing with Natural Language

One of HunyuanImage 3.0's standout features — edit existing images by describing changes:

3. Fast Generation with Distilled Model (8 Steps)

Comparison with Other Image Models

Feature
HunyuanImage 3.0
FLUX.2 Klein
SD 3.5 Large

Parameters

80B MoE (13B active)

32B DiT

8B DiT

Architecture

Autoregressive MoE

Diffusion Transformer

Diffusion Transformer

Image Editing

✅ Native

❌ Requires ControlNet

❌ Requires img2img

Multi-Image Fusion

✅ Native

Style Transfer

✅ Native

❌ Requires LoRA

❌ Requires LoRA

Min VRAM

~24GB (offloaded)

16GB

8GB

Speed (A100)

~15–30 sec

~0.3 sec

~5 sec

License

Tencent Community

Apache 2.0

Stability AI CL

Tips for Clore.ai Users

  1. Use the Distilled model for speedHunyuanImage-3.0-Instruct-Distil generates in 8 steps instead of 30–50, cutting inference time by 4–6×. Quality remains surprisingly close to the full model.

  2. A100 80GB is the sweet spot — A single A100 80GB (~$2–4/day on Clore.ai) runs the Instruct model without any offloading tricks. This is much faster than an RTX 4090 with CPU offloading.

  3. Pre-download models — The full Instruct checkpoint is ~160GB. Download it once to a persistent Clore.ai volume to avoid re-downloading every time you spin up a new instance.

  4. Use SSH tunneling for Gradio — Don't expose port 7860 publicly. Use ssh -L 7860:localhost:7860 to access the web interface securely from your browser.

  5. Try the vLLM backend for batch work — If you're generating many images, the vLLM inference path (in the vllm_infer/ folder) provides significantly better throughput.

Troubleshooting

Issue
Solution

CUDA out of memory on RTX 4090

Use device_map="auto" to enable CPU offloading, or switch to the Distil model

Download fails / very slow

Set HF_TOKEN env variable; use huggingface-cli download with --resume-download

Cannot load model via HF model ID

Due to the dot in the name, clone locally first: huggingface-cli download tencent/HunyuanImage-3.0-Instruct --local-dir ./ckpts/

Blurry or low-quality outputs

Increase --num-inference-steps to 40–50; increase --guidance-scale to 7.0

Image editing ignores instructions

Be specific about what to change and what to preserve; use short, clear prompts

Gradio interface won't start

Ensure gradio>=4.0 is installed; check that the model path points to the correct directory

Further Reading

Last updated

Was this helpful?