# FLUX.1

{% hint style="info" %}
**Faster alternative!** [**FLUX.2 Klein**](https://docs.clore.ai/guides/image-generation/flux2-klein) generates images in < 0.5 seconds (vs 10–30s for FLUX.1) with comparable quality. This guide is still relevant for LoRA training and ControlNet workflows.
{% endhint %}

State-of-the-art image generation model from Black Forest Labs on CLORE.AI GPUs.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Why FLUX.1?

* **Best quality** - Superior to SDXL and Midjourney v5
* **Text rendering** - Actually readable text in images
* **Prompt following** - Excellent instruction adherence
* **Fast variants** - FLUX.1-schnell for quick generation

## Model Variants

| Model          | Speed             | Quality   | VRAM  | License        |
| -------------- | ----------------- | --------- | ----- | -------------- |
| FLUX.1-schnell | Fast (4 steps)    | Great     | 12GB+ | Apache 2.0     |
| FLUX.1-dev     | Medium (20 steps) | Excellent | 16GB+ | Non-commercial |
| FLUX.1-pro     | API only          | Best      | -     | Commercial     |

## Quick Deploy on CLORE.AI

**Docker Image:**

```
ghcr.io/huggingface/text-generation-inference:latest
```

**Ports:**

```
22/tcp
7860/http
```

For easiest deployment, use **ComfyUI with FLUX nodes**.

## Installation Methods

### Method 1: ComfyUI (Recommended)

```bash
# Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

# Download FLUX models
cd models/unet
wget https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors

# Download required components
cd ../clip
wget https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors
wget https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors

cd ../vae
wget https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/ae.safetensors

# Run ComfyUI
python main.py --listen 0.0.0.0
```

### Method 2: Diffusers

```bash
pip install diffusers transformers accelerate torch

python << 'PYEOF'
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

image = pipe(
    "A cat wearing a space suit on Mars",
    num_inference_steps=4,
    guidance_scale=0.0,
).images[0]

image.save("flux_output.png")
PYEOF
```

### Method 3: Fooocus

Fooocus has built-in FLUX support:

```bash
git clone https://github.com/lllyasviel/Fooocus
cd Fooocus
pip install -r requirements.txt

# Download FLUX model to models/checkpoints/
python launch.py --listen
```

## ComfyUI Workflow

### FLUX.1-schnell (Fast)

Nodes needed:

1. **Load Diffusion Model** → flux1-schnell.safetensors
2. **DualCLIPLoader** → clip\_l.safetensors + t5xxl\_fp16.safetensors
3. **CLIP Text Encode** → your prompt
4. **Empty SD3 Latent Image** → set dimensions
5. **KSampler** → steps: 4, cfg: 1.0
6. **VAE Decode** → ae.safetensors
7. **Save Image**

### FLUX.1-dev (Quality)

Same workflow but:

* Steps: 20-50
* CFG: 3.5
* Use guidance\_scale in prompt

## Python API

### Basic Generation

```python
import torch
from diffusers import FluxPipeline

# Load model
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

# Generate
image = pipe(
    prompt="A serene Japanese garden with cherry blossoms",
    height=1024,
    width=1024,
    num_inference_steps=4,
    guidance_scale=0.0,
).images[0]

image.save("output.png")
```

### With Memory Optimization

```python
from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)

# Enable optimizations
pipe.enable_model_cpu_offload()  # Saves ~10GB VRAM
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

image = pipe(
    "Portrait of a cyberpunk samurai",
    height=1024,
    width=1024,
    num_inference_steps=4,
).images[0]
```

### Batch Generation

```python
prompts = [
    "A sunset over mountains",
    "A futuristic city at night",
    "An underwater coral reef",
]

images = pipe(
    prompts,
    height=1024,
    width=1024,
    num_inference_steps=4,
).images

for i, img in enumerate(images):
    img.save(f"output_{i}.png")
```

## FLUX.1-dev (Higher Quality)

```python
from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="Hyperrealistic portrait of an elderly fisherman",
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=3.5,
).images[0]
```

## Prompt Tips

### FLUX excels at:

* **Text in images**: "A neon sign that says 'OPEN 24/7'"
* **Complex scenes**: "A busy Tokyo street at night with reflections"
* **Specific styles**: "Oil painting in the style of Monet"
* **Detailed descriptions**: Long, detailed prompts work well

### Example Prompts

```
# Photorealistic
A professional photograph of a golden retriever puppy playing in autumn leaves, 
shallow depth of field, warm afternoon light, Canon EOS R5

# Artistic
An impressionist painting of a Parisian cafe in the rain, 
oil on canvas, visible brushstrokes, warm colors

# Text rendering
A vintage movie poster with the title "COSMIC VOYAGE" in bold retro letters,
1960s sci-fi aesthetic, astronaut illustration

# Complex scene
A cozy library interior with floor-to-ceiling bookshelves, 
a leather armchair by a fireplace, rain visible through a window,
warm lamp light, photorealistic
```

## Memory Optimization

### For 12GB VRAM (RTX 3060)

```python
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.float16  # Use fp16 instead of bf16
)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

# Generate at lower resolution
image = pipe(prompt, height=768, width=768, num_inference_steps=4).images[0]
```

### For 8GB VRAM

Use quantized version or ComfyUI with GGUF:

```bash
# In ComfyUI, install GGUF nodes
cd custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF

# Download quantized model
wget https://huggingface.co/city96/FLUX.1-schnell-gguf/resolve/main/flux1-schnell-Q4_K_S.gguf
```

## Performance Comparison

| Model          | Steps | Time (4090) | Quality   |
| -------------- | ----- | ----------- | --------- |
| FLUX.1-schnell | 4     | \~3 sec     | Great     |
| FLUX.1-dev     | 20    | \~12 sec    | Excellent |
| FLUX.1-dev     | 50    | \~30 sec    | Best      |
| SDXL           | 30    | \~8 sec     | Good      |

## GPU Requirements

| Setup            | Minimum | Recommended |
| ---------------- | ------- | ----------- |
| FLUX.1-schnell   | 12GB    | 16GB+       |
| FLUX.1-dev       | 16GB    | 24GB+       |
| With CPU offload | 8GB     | 12GB+       |
| Quantized (GGUF) | 6GB     | 8GB+        |

## GPU Presets

### RTX 3060 12GB (Budget)

```python
# Use quantized model
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()

# Settings:
# - schnell only (dev may OOM)
# - 512x512 to 768x768
# - 4 steps
# - Batch size 1
```

### RTX 3090 24GB (Optimal)

```python
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_vae_tiling()

# Settings:
# - schnell: 1024x1024, batch 2
# - dev: 1024x1024, batch 1
# - 20-30 steps for dev
# - Enable VAE tiling for high-res
```

### RTX 4090 24GB (Performance)

```python
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.float16
)
pipe.to("cuda")

# Settings:
# - schnell: 1024x1024, batch 4
# - dev: 1024x1024, batch 2
# - 30-50 steps for best quality
# - Can do 1536x1536 with tiling
```

### A100 40GB/80GB (Production)

```python
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

# Settings:
# - schnell: 1024x1024, batch 8+
# - dev: 1024x1024, batch 4
# - 50 steps for maximum quality
# - 2048x2048 possible
```

## Cost Estimate

| GPU           | Hourly  | Images/Hour      |
| ------------- | ------- | ---------------- |
| RTX 3060 12GB | \~$0.03 | \~200 (schnell)  |
| RTX 3090 24GB | \~$0.06 | \~600 (schnell)  |
| RTX 4090 24GB | \~$0.10 | \~1000 (schnell) |
| A100 40GB     | \~$0.17 | \~1500 (schnell) |

## Troubleshooting

### Out of Memory

```python
# Use CPU offload
pipe.enable_model_cpu_offload()

# Or sequential CPU offload (slower but less VRAM)
pipe.enable_sequential_cpu_offload()

# Reduce resolution
height=768, width=768
```

### Slow Generation

* Use FLUX.1-schnell (4 steps)
* Enable torch.compile: `pipe.unet = torch.compile(pipe.unet)`
* Use fp16 instead of bf16 on older GPUs

### Poor Quality

* Use more steps (FLUX-dev: 30-50)
* Increase guidance\_scale (3.0-4.0 for dev)
* Write more detailed prompts

***

## FLUX LoRA

LoRA (Low-Rank Adaptation) weights allow you to fine-tune FLUX for specific styles, characters, or concepts without retraining the full model. Hundreds of community LoRAs are available on HuggingFace and CivitAI.

### Installation

```bash
pip install diffusers transformers accelerate peft
```

### Loading a Single LoRA

```python
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

# Load LoRA weights from a local file
pipe.load_lora_weights("path/to/lora.safetensors")

image = pipe(
    "A portrait in the style of Van Gogh, swirling brushstrokes",
    num_inference_steps=20,
    guidance_scale=3.5,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
image.save("flux_lora_output.png")
```

### Loading from HuggingFace Hub

```python
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

# Load LoRA directly from HuggingFace repo
pipe.load_lora_weights(
    "username/my-flux-lora",          # HF repo ID
    weight_name="my_lora.safetensors" # filename in repo
)

image = pipe(
    "trigger_word a beautiful landscape",
    num_inference_steps=20,
    guidance_scale=3.5,
).images[0]
image.save("output.png")
```

### LoRA Scale (Strength)

```python
# Control LoRA influence with cross_attention_kwargs
image = pipe(
    "A cyberpunk character, neon lights",
    num_inference_steps=20,
    guidance_scale=3.5,
    cross_attention_kwargs={"scale": 0.8},  # 0.0 = no effect, 1.0 = full effect
).images[0]
```

### Combining Multiple LoRAs

```python
from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

# Load first LoRA
pipe.load_lora_weights(
    "path/to/style_lora.safetensors",
    adapter_name="style"
)

# Load second LoRA
pipe.load_lora_weights(
    "path/to/character_lora.safetensors",
    adapter_name="character"
)

# Combine with weights
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.9])

image = pipe(
    "character_trigger wearing elaborate costume, artistic_trigger style",
    num_inference_steps=25,
    guidance_scale=3.5,
).images[0]
image.save("combined_lora.png")
```

### Unloading LoRA

```python
# Remove LoRA weights to restore base model
pipe.unload_lora_weights()
```

### Training Your Own FLUX LoRA

```bash
# Use kohya-ss or ai-toolkit for FLUX LoRA training
git clone https://github.com/ostris/ai-toolkit
cd ai-toolkit
pip install -r requirements.txt

# Prepare dataset: 10-30 images with captions
# Edit config YAML, then:
python run.py config/flux_lora_train.yaml
```

### Recommended LoRA Sources

| Source      | URL                   | Notes                   |
| ----------- | --------------------- | ----------------------- |
| CivitAI     | civitai.com           | Large community library |
| HuggingFace | huggingface.co/models | Filter by FLUX          |
| Replicate   | replicate.com         | Browse trained models   |

***

## ControlNet for FLUX

ControlNet allows guiding FLUX generation with structural inputs like canny edges, depth maps, and pose skeletons. XLabs-AI has released the first ControlNet models specifically for FLUX.1.

### Installation

```bash
pip install diffusers transformers accelerate controlnet-aux pillow
```

### FLUX ControlNet Canny (XLabs-AI)

```python
import torch
import numpy as np
from PIL import Image
from diffusers import FluxControlNetPipeline, FluxControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector

# Load the FLUX ControlNet model (Canny variant)
controlnet = FluxControlNetModel.from_pretrained(
    "XLabs-AI/flux-controlnet-canny-diffusers",
    torch_dtype=torch.bfloat16
)

# Load the pipeline with ControlNet
pipe = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    controlnet=controlnet,
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

# Prepare the control image (canny edges)
input_image = load_image("your_input.jpg").resize((1024, 1024))
canny = CannyDetector()
control_image = canny(input_image, low_threshold=50, high_threshold=200)

# Generate with ControlNet guidance
image = pipe(
    prompt="A futuristic cityscape with neon signs, photorealistic, 8K",
    control_image=control_image,
    controlnet_conditioning_scale=0.7,
    num_inference_steps=25,
    guidance_scale=3.5,
    generator=torch.Generator(device="cuda").manual_seed(0),
).images[0]

image.save("controlnet_flux_output.png")
```

### FLUX ControlNet Depth

```python
import torch
from PIL import Image
from diffusers import FluxControlNetPipeline, FluxControlNetModel
from diffusers.utils import load_image
from transformers import pipeline as hf_pipeline

# Load depth estimator
depth_estimator = hf_pipeline("depth-estimation", model="LiheYoung/depth-anything-small-hf")

# Prepare depth map
input_image = load_image("portrait.jpg").resize((1024, 1024))
depth_result = depth_estimator(input_image)["depth"]
depth_image = depth_result.convert("RGB")

# Load ControlNet Depth
controlnet = FluxControlNetModel.from_pretrained(
    "XLabs-AI/flux-controlnet-depth-diffusers",
    torch_dtype=torch.bfloat16
)

pipe = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    controlnet=controlnet,
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A marble statue of a warrior, dramatic lighting, museum photo",
    control_image=depth_image,
    controlnet_conditioning_scale=0.6,
    num_inference_steps=20,
    guidance_scale=3.5,
).images[0]
image.save("depth_controlnet_output.png")
```

### Multi-ControlNet for FLUX

```python
import torch
from diffusers import FluxControlNetPipeline, FluxMultiControlNetModel, FluxControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector

# Load multiple ControlNets
controlnet_canny = FluxControlNetModel.from_pretrained(
    "XLabs-AI/flux-controlnet-canny-diffusers",
    torch_dtype=torch.bfloat16
)
controlnet_depth = FluxControlNetModel.from_pretrained(
    "XLabs-AI/flux-controlnet-depth-diffusers",
    torch_dtype=torch.bfloat16
)

# Combine into MultiControlNet
multi_controlnet = FluxMultiControlNetModel([controlnet_canny, controlnet_depth])

pipe = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    controlnet=multi_controlnet,
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A knight in armor standing in a forest, dramatic lighting",
    control_image=[canny_image, depth_image],
    controlnet_conditioning_scale=[0.7, 0.5],
    num_inference_steps=25,
    guidance_scale=3.5,
).images[0]
```

### Available FLUX ControlNet Models

| Model         | Repo                                        | Use Case                |
| ------------- | ------------------------------------------- | ----------------------- |
| Canny         | XLabs-AI/flux-controlnet-canny-diffusers    | Edge-guided generation  |
| Depth         | XLabs-AI/flux-controlnet-depth-diffusers    | Depth-guided generation |
| HED/Soft Edge | XLabs-AI/flux-controlnet-hed-diffusers      | Soft structural control |
| Pose          | XLabs-AI/flux-controlnet-openpose-diffusers | Pose-guided portraits   |

### ControlNet Tips

* **conditioning\_scale 0.5–0.8** works best for FLUX (too high loses creativity)
* Use **1024×1024** or multiples for best quality
* Combine with LoRA for style + structure control
* Lower steps (20–25) is usually sufficient with ControlNet

***

## FLUX.1-schnell: Fast Generation Mode

FLUX.1-schnell is the distilled, speed-optimized variant of FLUX. It generates high-quality images in just **4 steps** (vs 20–50 for FLUX.1-dev), making it ideal for rapid prototyping and high-throughput workflows.

### Key Differences vs FLUX.1-dev

| Feature         | FLUX.1-schnell                   | FLUX.1-dev     |
| --------------- | -------------------------------- | -------------- |
| Steps           | 4                                | 20–50          |
| Speed (4090)    | \~3 sec                          | \~12–30 sec    |
| License         | **Apache 2.0** (free commercial) | Non-commercial |
| guidance\_scale | 0.0 (no CFG)                     | 3.5            |
| Quality         | Great                            | Excellent      |
| VRAM            | 12GB+                            | 16GB+          |

> **License note:** FLUX.1-schnell is Apache 2.0 — you can use it in commercial products freely. FLUX.1-dev requires a separate commercial license from Black Forest Labs.

### Quick Start

```python
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A stunning aerial view of New York City at golden hour, photorealistic",
    height=1024,
    width=1024,
    num_inference_steps=4,   # Only 4 steps needed!
    guidance_scale=0.0,       # CFG disabled for schnell
    max_sequence_length=256,
    generator=torch.Generator(device="cpu").manual_seed(0),
).images[0]

image.save("schnell_output.png")
print("Generated in ~3 seconds on RTX 4090!")
```

### High-Throughput Batch Generation

```python
import torch
from diffusers import FluxPipeline
from pathlib import Path

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")  # Keep on GPU for speed, don't use cpu_offload

output_dir = Path("schnell_outputs")
output_dir.mkdir(exist_ok=True)

prompts = [
    "A serene mountain lake at dawn, misty atmosphere",
    "A bustling Tokyo street market at night, neon reflections",
    "A macro photograph of a dew-covered spider web",
    "An ancient library with floating books and magical light",
    "A futuristic underwater city with bioluminescent sea life",
]

# Batch generation
for i, prompt in enumerate(prompts):
    image = pipe(
        prompt=prompt,
        height=1024,
        width=1024,
        num_inference_steps=4,
        guidance_scale=0.0,
        generator=torch.Generator(device="cuda").manual_seed(i),
    ).images[0]
    image.save(output_dir / f"image_{i:04d}.png")
    print(f"Generated {i+1}/{len(prompts)}: {prompt[:50]}...")
```

### Multiple Aspect Ratios with schnell

```python
import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

# FLUX supports flexible aspect ratios
resolutions = {
    "square":    (1024, 1024),
    "portrait":  (768,  1360),
    "landscape": (1360, 768),
    "tall":      (576,  1792),
    "wide":      (1792, 576),
}

prompt = "A majestic wolf in a snowy forest, professional wildlife photography"

for name, (width, height) in resolutions.items():
    image = pipe(
        prompt=prompt,
        height=height,
        width=width,
        num_inference_steps=4,
        guidance_scale=0.0,
    ).images[0]
    image.save(f"schnell_{name}.png")
    print(f"Saved {name}: {width}x{height}")
```

### schnell with Memory Optimizations

```python
import torch
from diffusers import FluxPipeline

# For 12GB VRAM (RTX 3060/3080)
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.float16  # fp16 saves memory on older GPUs
)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

image = pipe(
    prompt="A cozy cabin in autumn forest, warm light through windows",
    height=768,
    width=768,
    num_inference_steps=4,
    guidance_scale=0.0,
).images[0]
image.save("schnell_low_vram.png")
```

### Performance Benchmarks (schnell)

| GPU           | VRAM | Time/image (1024px) | Images/hour |
| ------------- | ---- | ------------------- | ----------- |
| RTX 3060 12GB | 12GB | \~8 sec             | \~450       |
| RTX 3090 24GB | 24GB | \~4 sec             | \~900       |
| RTX 4090 24GB | 24GB | \~3 sec             | \~1200      |
| A100 40GB     | 40GB | \~2 sec             | \~1800      |

### When to Use schnell vs dev

**Use FLUX.1-schnell when:**

* Rapid prototyping / testing prompts
* High-volume batch generation
* Commercial projects (Apache 2.0)
* Limited GPU budget
* Real-time or near-real-time applications

**Use FLUX.1-dev when:**

* Maximum image quality is priority
* Fine detail and complex scenes
* Research / artistic work
* Combining with LoRA/ControlNet (dev tends to respond better)

***

## Next Steps

* [ComfyUI](https://docs.clore.ai/guides/image-generation/comfyui) - Best interface for FLUX
* [Fooocus](https://docs.clore.ai/guides/image-generation/fooocus-simple-sd) - Simple alternative
* [ControlNet](https://docs.clore.ai/guides/image-processing/controlnet-advanced) - Guided generation
* [Kohya Training](https://docs.clore.ai/guides/training/kohya-training) - Train FLUX LoRAs
