HunyuanImage 3.0
Run HunyuanImage 3.0 — Tencent's 80B MoE multimodal image generation and editing model on Clore.ai GPUs
HunyuanImage 3.0 by Tencent is the world's largest open-source image generation model with 80B total parameters (13B active during inference). Released on January 26, 2026, it breaks the mold by unifying image generation, editing, and understanding into a single autoregressive model — no more separate pipelines for text-to-image and image-to-image. It generates photorealistic images, performs precise element-preserving edits, handles style transfers, and even does multi-image fusion, all from one model.
HuggingFace: tencent/HunyuanImage-3.0-Instruct GitHub: Tencent-Hunyuan/HunyuanImage-3.0 License: Tencent Hunyuan Community License (free for research & commercial use under 100M MAU)
Key Features
80B total / 13B active parameters — largest open-source image MoE model; activates only 13B params per inference
Unified multimodal architecture — text-to-image, image editing, style transfer, and multi-image composition in one model
Instruction-driven editing — describe what you want changed in natural language, preserving untouched elements
Distilled checkpoint available —
HunyuanImage-3.0-Instruct-Distilruns in just 8 sampling steps for faster generationvLLM acceleration — native vLLM support for significantly faster inference in production
Autoregressive framework — unlike DiT-based models (FLUX, SD3.5), uses a unified AR approach for both understanding and generation
Model Variants
HunyuanImage-3.0
Text-to-image only
30–50
tencent/HunyuanImage-3.0
HunyuanImage-3.0-Instruct
Text-to-image + editing + multi-image
30–50
tencent/HunyuanImage-3.0-Instruct
HunyuanImage-3.0-Instruct-Distil
Fast inference (8 steps)
8
tencent/HunyuanImage-3.0-Instruct-Distil
Requirements
GPU
1× RTX 4090 24GB
1× A100 80GB
2–3× A100 80GB
VRAM
24GB (with layer offload)
80GB
160–240GB
RAM
128GB
128GB
256GB
Disk
200GB
200GB
200GB
CUDA
12.0+
12.0+
12.0+
Recommended Clore.ai setup:
Best value: 1× A100 80GB (~$2–4/day) — runs the full model comfortably without offloading
Budget option: 1× RTX 4090 (~$0.5–2/day) — works with CPU offloading (slower, but functional)
Fast production: 2× A100 80GB (~$4–8/day) — for batch generation and the Instruct model
Quick Start
Installation
Text-to-Image with Transformers
Using the Gradio Web Interface
The easiest way to experiment with all features:
Then access via SSH tunnel: ssh -L 7860:localhost:7860 root@<clore-ip>
Usage Examples
1. Text-to-Image Generation (CLI)
2. Image Editing with Natural Language
One of HunyuanImage 3.0's standout features — edit existing images by describing changes:
3. Fast Generation with Distilled Model (8 Steps)
Comparison with Other Image Models
Parameters
80B MoE (13B active)
32B DiT
8B DiT
Architecture
Autoregressive MoE
Diffusion Transformer
Diffusion Transformer
Image Editing
✅ Native
❌ Requires ControlNet
❌ Requires img2img
Multi-Image Fusion
✅ Native
❌
❌
Style Transfer
✅ Native
❌ Requires LoRA
❌ Requires LoRA
Min VRAM
~24GB (offloaded)
16GB
8GB
Speed (A100)
~15–30 sec
~0.3 sec
~5 sec
License
Tencent Community
Apache 2.0
Stability AI CL
Tips for Clore.ai Users
Use the Distilled model for speed —
HunyuanImage-3.0-Instruct-Distilgenerates in 8 steps instead of 30–50, cutting inference time by 4–6×. Quality remains surprisingly close to the full model.A100 80GB is the sweet spot — A single A100 80GB (~$2–4/day on Clore.ai) runs the Instruct model without any offloading tricks. This is much faster than an RTX 4090 with CPU offloading.
Pre-download models — The full Instruct checkpoint is ~160GB. Download it once to a persistent Clore.ai volume to avoid re-downloading every time you spin up a new instance.
Use SSH tunneling for Gradio — Don't expose port 7860 publicly. Use
ssh -L 7860:localhost:7860to access the web interface securely from your browser.Try the vLLM backend for batch work — If you're generating many images, the vLLM inference path (in the
vllm_infer/folder) provides significantly better throughput.
Troubleshooting
CUDA out of memory on RTX 4090
Use device_map="auto" to enable CPU offloading, or switch to the Distil model
Download fails / very slow
Set HF_TOKEN env variable; use huggingface-cli download with --resume-download
Cannot load model via HF model ID
Due to the dot in the name, clone locally first: huggingface-cli download tencent/HunyuanImage-3.0-Instruct --local-dir ./ckpts/
Blurry or low-quality outputs
Increase --num-inference-steps to 40–50; increase --guidance-scale to 7.0
Image editing ignores instructions
Be specific about what to change and what to preserve; use short, clear prompts
Gradio interface won't start
Ensure gradio>=4.0 is installed; check that the model path points to the correct directory
Further Reading
GitHub Repository — Official code, inference scripts, Gradio demo
HunyuanImage 3.0-Instruct (HuggingFace) — Full model weights
Distilled Checkpoint — 8-step fast inference
Technical Report (arXiv) — Architecture details and benchmarks
ComfyUI Integration — Community ComfyUI custom node
Last updated
Was this helpful?