HunyuanImage 3.0

Run HunyuanImage 3.0 — Tencent's 80B MoE multimodal image generation and editing model on Clore.ai GPUs

HunyuanImage 3.0 by Tencent is the world's largest open-source image generation model with 80B total parameters (13B active during inference). Released on January 26, 2026, it breaks the mold by unifying image generation, editing, and understanding into a single autoregressive model — no more separate pipelines for text-to-image and image-to-image. It generates photorealistic images, performs precise element-preserving edits, handles style transfers, and even does multi-image fusion, all from one model.

HuggingFace: tencent/HunyuanImage-3.0-Instruct GitHub: Tencent-Hunyuan/HunyuanImage-3.0 License: Tencent Hunyuan Community License (free for research & commercial use under 100M MAU)

Key Features

80B total / 13B active parameters — largest open-source image MoE model; activates only 13B params per inference
Unified multimodal architecture — text-to-image, image editing, style transfer, and multi-image composition in one model
Instruction-driven editing — describe what you want changed in natural language, preserving untouched elements
Distilled checkpoint available — HunyuanImage-3.0-Instruct-Distil runs in just 8 sampling steps for faster generation
vLLM acceleration — native vLLM support for significantly faster inference in production
Autoregressive framework — unlike DiT-based models (FLUX, SD3.5), uses a unified AR approach for both understanding and generation

Model Variants

Model

Use Case

Steps

HuggingFace

HunyuanImage-3.0

Text-to-image only

30–50

tencent/HunyuanImage-3.0

HunyuanImage-3.0-Instruct

Text-to-image + editing + multi-image

30–50

tencent/HunyuanImage-3.0-Instruct

HunyuanImage-3.0-Instruct-Distil

Fast inference (8 steps)

tencent/HunyuanImage-3.0-Instruct-Distil

Requirements

Configuration

Single GPU (offloading)

Recommended

Multi-GPU Production

GPU

1× RTX 4090 24GB

1× A100 80GB

2–3× A100 80GB

VRAM

24GB (with layer offload)

80GB

160–240GB

RAM

128GB

256GB

Disk

200GB

CUDA

12.0+

Recommended Clore.ai setup:

Best value: 1× A100 80GB (~$2–4/day) — runs the full model comfortably without offloading
Budget option: 1× RTX 4090 (~$0.5–2/day) — works with CPU offloading (slower, but functional)
Fast production: 2× A100 80GB (~$4–8/day) — for batch generation and the Instruct model

Quick Start

Installation

# Clone the repository
git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
cd HunyuanImage-3.0

# Create environment
pip install -r requirements.txt
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Download model weights
huggingface-cli download tencent/HunyuanImage-3.0-Instruct --local-dir ./ckpts/HunyuanImage-3-Instruct

Text-to-Image with Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model (requires ~80GB VRAM for full precision)
model_path = "./ckpts/HunyuanImage-3-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# Generate an image from text
prompt = "A serene Japanese garden in autumn, koi fish swimming in a crystal-clear pond, golden maple leaves falling, watercolor painting style"
output = model.generate_image(prompt, num_inference_steps=30)
output.save("japanese_garden.png")

Using the Gradio Web Interface

The easiest way to experiment with all features:

cd HunyuanImage-3.0

# Install Gradio
pip install gradio

# Launch the web interface
python gradio_demo.py \
    --model-path ./ckpts/HunyuanImage-3-Instruct \
    --server-name 0.0.0.0 \
    --server-port 7860

Then access via SSH tunnel: ssh -L 7860:localhost:7860 root@<clore-ip>

Usage Examples

1. Text-to-Image Generation (CLI)

cd HunyuanImage-3.0

python inference.py \
    --model-path ./ckpts/HunyuanImage-3-Instruct \
    --prompt "Cyberpunk cityscape at night, neon-lit skyscrapers reflected in rain-soaked streets, flying cars, volumetric fog, 8K" \
    --output-path output.png \
    --num-inference-steps 30 \
    --guidance-scale 5.0

2. Image Editing with Natural Language

One of HunyuanImage 3.0's standout features — edit existing images by describing changes:

python inference.py \
    --model-path ./ckpts/HunyuanImage-3-Instruct \
    --prompt "Change the season to winter with snow covering the trees" \
    --image-path input_photo.jpg \
    --output-path edited_winter.png \
    --num-inference-steps 30

3. Fast Generation with Distilled Model (8 Steps)

# Download distilled checkpoint
huggingface-cli download tencent/HunyuanImage-3.0-Instruct-Distil \
    --local-dir ./ckpts/HunyuanImage-3-Instruct-Distil

# Generate with only 8 steps (5-6× faster)
python inference.py \
    --model-path ./ckpts/HunyuanImage-3-Instruct-Distil \
    --prompt "Portrait of an astronaut riding a horse on Mars, photorealistic" \
    --output-path astronaut.png \
    --num-inference-steps 8

Comparison with Other Image Models

Feature

HunyuanImage 3.0

FLUX.2 Klein

SD 3.5 Large

Parameters

80B MoE (13B active)

32B DiT

8B DiT

Architecture

Autoregressive MoE

Diffusion Transformer

Image Editing

✅ Native

❌ Requires ControlNet

❌ Requires img2img

Multi-Image Fusion

✅ Native

❌

Style Transfer

✅ Native

❌ Requires LoRA

Min VRAM

~24GB (offloaded)

16GB

8GB

Speed (A100)

~15–30 sec

~0.3 sec

~5 sec

License

Tencent Community

Apache 2.0

Stability AI CL

Tips for Clore.ai Users

Use the Distilled model for speed — HunyuanImage-3.0-Instruct-Distil generates in 8 steps instead of 30–50, cutting inference time by 4–6×. Quality remains surprisingly close to the full model.
A100 80GB is the sweet spot — A single A100 80GB (~$2–4/day on Clore.ai) runs the Instruct model without any offloading tricks. This is much faster than an RTX 4090 with CPU offloading.
Pre-download models — The full Instruct checkpoint is ~160GB. Download it once to a persistent Clore.ai volume to avoid re-downloading every time you spin up a new instance.
Use SSH tunneling for Gradio — Don't expose port 7860 publicly. Use ssh -L 7860:localhost:7860 to access the web interface securely from your browser.
Try the vLLM backend for batch work — If you're generating many images, the vLLM inference path (in the vllm_infer/ folder) provides significantly better throughput.

Troubleshooting

Issue

Solution

CUDA out of memory on RTX 4090

Use device_map="auto" to enable CPU offloading, or switch to the Distil model

Download fails / very slow

Set HF_TOKEN env variable; use huggingface-cli download with --resume-download

Cannot load model via HF model ID

Due to the dot in the name, clone locally first: huggingface-cli download tencent/HunyuanImage-3.0-Instruct --local-dir ./ckpts/

Blurry or low-quality outputs

Increase --num-inference-steps to 40–50; increase --guidance-scale to 7.0

Image editing ignores instructions

Be specific about what to change and what to preserve; use short, clear prompts

Gradio interface won't start

Ensure gradio>=4.0 is installed; check that the model path points to the correct directory

hashtagKey Features

hashtagModel Variants

hashtagRequirements

hashtagQuick Start

hashtagInstallation

hashtagText-to-Image with Transformers

hashtagUsing the Gradio Web Interface

hashtagUsage Examples

hashtag1. Text-to-Image Generation (CLI)

hashtag2. Image Editing with Natural Language

hashtag3. Fast Generation with Distilled Model (8 Steps)

hashtagComparison with Other Image Models

hashtagTips for Clore.ai Users

hashtagTroubleshooting

hashtagFurther Reading

Key Features

Model Variants

Requirements

Quick Start

Installation

Text-to-Image with Transformers

Using the Gradio Web Interface

Usage Examples

1. Text-to-Image Generation (CLI)

2. Image Editing with Natural Language

3. Fast Generation with Distilled Model (8 Steps)

Comparison with Other Image Models

Tips for Clore.ai Users

Troubleshooting

Further Reading