# FLUX.2 Klein

FLUX.2 Klein by Black Forest Labs is the successor to FLUX.1, delivering the same image quality at **20–60× the speed**. Where FLUX.1 took 10–30 seconds per image, FLUX.2 Klein generates in **under 0.5 seconds** on an RTX 4090. It's a 32B Diffusion Transformer (DiT) model with an Apache 2.0 license, and as of January 2026, it's even experimentally supported in Ollama.

## Key Features

* **< 0.5 second generation**: 20–60× faster than FLUX.1
* **32B DiT architecture**: Same quality as FLUX.1 dev
* **Apache 2.0 license**: Full commercial use
* **Ollama support**: Experimental image generation via Ollama (Jan 2026)
* **ComfyUI compatible**: Drop-in replacement for FLUX.1 workflows
* **LoRA + ControlNet**: Community adapters available

## Requirements

| Component | Minimum                | Recommended   |
| --------- | ---------------------- | ------------- |
| GPU       | RTX 3090 24GB          | RTX 4090 24GB |
| VRAM      | 16GB (with offloading) | 24GB          |
| RAM       | 32GB                   | 64GB          |
| Disk      | 40GB                   | 60GB          |
| CUDA      | 12.0+                  | 12.1+         |

**Recommended Clore.ai GPU**: RTX 4090 24GB (\~$0.5–2/day) — sub-second generation

### Speed Comparison: FLUX.1 vs FLUX.2 Klein

| GPU      | FLUX.1 dev (20 steps) | FLUX.2 Klein | Speedup |
| -------- | --------------------- | ------------ | ------- |
| RTX 3090 | \~25 sec              | \~1.2 sec    | 20×     |
| RTX 4090 | \~12 sec              | \~0.4 sec    | 30×     |
| RTX 5090 | \~8 sec               | \~0.25 sec   | 32×     |
| H100     | \~5 sec               | \~0.15 sec   | 33×     |

## Quick Start with diffusers

```python
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

# Generate image in < 0.5 seconds!
image = pipe(
    prompt="a cyberpunk GPU mining rig in a neon-lit server room, photorealistic",
    height=1024,
    width=1024,
    num_inference_steps=4,  # Klein needs only 4 steps!
    guidance_scale=3.5,
).images[0]

image.save("output.png")
```

### Memory-Efficient Mode (16GB GPUs)

```python
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()  # Fits on 16GB
pipe.vae.enable_tiling()         # Saves ~2GB

image = pipe("a mountain landscape at sunset", num_inference_steps=4).images[0]
```

## ComfyUI Workflow

FLUX.2 Klein works as a drop-in replacement in existing FLUX.1 ComfyUI workflows:

1. Download the FLUX.2 Klein checkpoint to `ComfyUI/models/diffusion_models/`
2. In your workflow, change the checkpoint node to point to FLUX.2 Klein
3. Reduce steps to 4 (instead of 20–50 for FLUX.1)
4. Set guidance scale to 3.0–4.0

```bash
# Download model for ComfyUI
cd ComfyUI/models/diffusion_models/
wget https://huggingface.co/black-forest-labs/FLUX.2-klein/resolve/main/flux2-klein.safetensors
```

## Batch Generation

With sub-second generation, FLUX.2 Klein enables massive batch processing:

```python
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein", torch_dtype=torch.bfloat16
).to("cuda")

prompts = [
    "a red sports car on a mountain road, cinematic",
    "a cozy coffee shop interior, warm lighting",
    "an astronaut floating above Earth, hyperrealistic",
    "a medieval castle in autumn, fantasy art",
    # ... add hundreds more
]

for i, prompt in enumerate(prompts):
    image = pipe(prompt, num_inference_steps=4, guidance_scale=3.5).images[0]
    image.save(f"batch_{i:04d}.png")
    print(f"Generated {i+1}/{len(prompts)}")

# On RTX 4090: ~100 images in under 1 minute!
```

## LoRA Support

```python
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein", torch_dtype=torch.bfloat16
).to("cuda")

# Load a LoRA trained on FLUX architecture
pipe.load_lora_weights("your-lora/flux2-style-lora", weight_name="lora.safetensors")
pipe.fuse_lora(lora_scale=0.8)

image = pipe("a portrait in the trained style", num_inference_steps=4).images[0]
```

## Tips for Clore.ai Users

* **Batch processing king**: At 0.4 sec/image, you can generate 10,000+ images per hour on RTX 4090
* **4 steps only**: Don't use more — Klein is optimized for 4 steps (more doesn't improve quality)
* **Same LoRAs as FLUX.1**: Most FLUX.1 LoRAs are compatible with Klein
* **ComfyUI drop-in**: Just swap the checkpoint, change steps to 4
* **RTX 3090 is viable**: 1.2 sec/image is still great at $0.3/day

## Troubleshooting

| Issue                    | Solution                                                                |
| ------------------------ | ----------------------------------------------------------------------- |
| OOM on 24GB              | Use `enable_model_cpu_offload()` + `vae.enable_tiling()`                |
| Blurry images            | Ensure `num_inference_steps=4`, not less. Check guidance\_scale 3.0–4.0 |
| Slow first generation    | Normal — model loads on first call (\~30s). Subsequent: sub-second      |
| ComfyUI checkpoint error | Ensure you have the `.safetensors` file, not the diffusers format       |

## Further Reading

* [FLUX.1 Guide](/guides/image-generation/flux.md) — original FLUX guide with LoRA and ControlNet details
* [ComfyUI Guide](/guides/image-generation/comfyui.md) — ComfyUI setup and workflows
* [Black Forest Labs Blog](https://blackforestlabs.ai/)
* [HuggingFace Model](https://huggingface.co/black-forest-labs/FLUX.2-klein)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/image-generation/flux2-klein.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.