FLUX.2 Klein

FLUX.2 Klein — sub-second image generation on Clore.ai GPUs

FLUX.2 Klein by Black Forest Labs is the successor to FLUX.1, delivering the same image quality at 20–60× the speed. Where FLUX.1 took 10–30 seconds per image, FLUX.2 Klein generates in under 0.5 seconds on an RTX 4090. It's a 32B Diffusion Transformer (DiT) model with an Apache 2.0 license, and as of January 2026, it's even experimentally supported in Ollama.

Key Features

  • < 0.5 second generation: 20–60× faster than FLUX.1

  • 32B DiT architecture: Same quality as FLUX.1 dev

  • Apache 2.0 license: Full commercial use

  • Ollama support: Experimental image generation via Ollama (Jan 2026)

  • ComfyUI compatible: Drop-in replacement for FLUX.1 workflows

  • LoRA + ControlNet: Community adapters available

Requirements

Component
Minimum
Recommended

GPU

RTX 3090 24GB

RTX 4090 24GB

VRAM

16GB (with offloading)

24GB

RAM

32GB

64GB

Disk

40GB

60GB

CUDA

12.0+

12.1+

Recommended Clore.ai GPU: RTX 4090 24GB (~$0.5–2/day) — sub-second generation

Speed Comparison: FLUX.1 vs FLUX.2 Klein

GPU
FLUX.1 dev (20 steps)
FLUX.2 Klein
Speedup

RTX 3090

~25 sec

~1.2 sec

20×

RTX 4090

~12 sec

~0.4 sec

30×

RTX 5090

~8 sec

~0.25 sec

32×

H100

~5 sec

~0.15 sec

33×

Quick Start with diffusers

Memory-Efficient Mode (16GB GPUs)

ComfyUI Workflow

FLUX.2 Klein works as a drop-in replacement in existing FLUX.1 ComfyUI workflows:

  1. Download the FLUX.2 Klein checkpoint to ComfyUI/models/diffusion_models/

  2. In your workflow, change the checkpoint node to point to FLUX.2 Klein

  3. Reduce steps to 4 (instead of 20–50 for FLUX.1)

  4. Set guidance scale to 3.0–4.0

Batch Generation

With sub-second generation, FLUX.2 Klein enables massive batch processing:

LoRA Support

Tips for Clore.ai Users

  • Batch processing king: At 0.4 sec/image, you can generate 10,000+ images per hour on RTX 4090

  • 4 steps only: Don't use more — Klein is optimized for 4 steps (more doesn't improve quality)

  • Same LoRAs as FLUX.1: Most FLUX.1 LoRAs are compatible with Klein

  • ComfyUI drop-in: Just swap the checkpoint, change steps to 4

  • RTX 3090 is viable: 1.2 sec/image is still great at $0.3/day

Troubleshooting

Issue
Solution

OOM on 24GB

Use enable_model_cpu_offload() + vae.enable_tiling()

Blurry images

Ensure num_inference_steps=4, not less. Check guidance_scale 3.0–4.0

Slow first generation

Normal — model loads on first call (~30s). Subsequent: sub-second

ComfyUI checkpoint error

Ensure you have the .safetensors file, not the diffusers format

Further Reading

Last updated

Was this helpful?