# SDXL Turbo & LCM

Generate images in 1-4 steps with SDXL Turbo and Latent Consistency Models on CLORE.AI GPUs.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Why SDXL Turbo / LCM?

* **Real-time speed** - Generate images in 1-4 steps vs 30-50
* **Same quality** - Comparable to full SDXL with 10x fewer steps
* **Interactive** - Fast enough for real-time applications
* **Low VRAM** - Efficient memory usage
* **LoRA compatible** - Use with existing SDXL LoRAs

## Model Variants

| Model           | Steps | Speed     | Quality   | VRAM |
| --------------- | ----- | --------- | --------- | ---- |
| SDXL Turbo      | 1-4   | Fastest   | Good      | 8GB  |
| SDXL Lightning  | 2-8   | Very Fast | Great     | 8GB  |
| LCM-SDXL        | 4-8   | Fast      | Great     | 8GB  |
| LCM-LoRA + SDXL | 4-8   | Fast      | Excellent | 10GB |
| SD Turbo (1.5)  | 1-4   | Fastest   | Good      | 4GB  |

## Quick Deploy on CLORE.AI

**Docker Image:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Ports:**

```
22/tcp
7860/http
```

**Command:**

```bash
pip install diffusers transformers accelerate gradio && \
python -c "
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    'stabilityai/sdxl-turbo',
    torch_dtype=torch.float16,
    variant='fp16'
).to('cuda')

def generate(prompt, steps, seed):
    generator = torch.Generator('cuda').manual_seed(seed) if seed > 0 else None
    image = pipe(prompt, num_inference_steps=steps, guidance_scale=0.0, generator=generator).images[0]
    return image

gr.Interface(
    fn=generate,
    inputs=[
        gr.Textbox(label='Prompt'),
        gr.Slider(1, 4, value=1, step=1, label='Steps'),
        gr.Number(value=-1, label='Seed')
    ],
    outputs=gr.Image(),
    title='SDXL Turbo - Real-time Generation'
).launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accessing Your Service

After deployment, find your `http_pub` URL in **My Orders**:

1. Go to **My Orders** page
2. Click on your order
3. Find the `http_pub` URL (e.g., `abc123.clorecloud.net`)

Use `https://YOUR_HTTP_PUB_URL` instead of `localhost` in examples below.

## Hardware Requirements

| Model          | Minimum GPU   | Recommended |
| -------------- | ------------- | ----------- |
| SD Turbo       | RTX 3060 8GB  | RTX 3070    |
| SDXL Turbo     | RTX 3070 8GB  | RTX 3080    |
| SDXL Lightning | RTX 3070 8GB  | RTX 3090    |
| LCM-SDXL       | RTX 3080 10GB | RTX 4090    |

## Installation

```bash
pip install diffusers transformers accelerate torch
```

## SDXL Turbo

### Basic Usage

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/sdxl-turbo",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")

# Generate in 1 step!
image = pipe(
    prompt="A cinematic shot of a baby raccoon wearing an intricate italian priest robe",
    num_inference_steps=1,
    guidance_scale=0.0  # Turbo doesn't use CFG
).images[0]

image.save("raccoon.png")
```

### Best Settings

```python
# 1 step - fastest, good quality
image = pipe(prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

# 2 steps - better details
image = pipe(prompt, num_inference_steps=2, guidance_scale=0.0).images[0]

# 4 steps - best quality for turbo
image = pipe(prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
```

## SDXL Lightning

### 2-Step Generation

```python
import torch
from diffusers import StableDiffusionXLPipeline, EulerDiscreteScheduler
from huggingface_hub import hf_hub_download

base = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ByteDance/SDXL-Lightning"
ckpt = "sdxl_lightning_2step_unet.safetensors"

# Load base model
pipe = StableDiffusionXLPipeline.from_pretrained(
    base,
    torch_dtype=torch.float16,
    variant="fp16"
).to("cuda")

# Load lightning unet
pipe.unet.load_state_dict(
    torch.load(hf_hub_download(repo, ckpt), map_location="cuda")
)

# Configure scheduler
pipe.scheduler = EulerDiscreteScheduler.from_config(
    pipe.scheduler.config,
    timestep_spacing="trailing"
)

# Generate in 2 steps
image = pipe(
    "A girl smiling in a garden",
    num_inference_steps=2,
    guidance_scale=0.0
).images[0]

image.save("lightning.png")
```

### 4-Step (Higher Quality)

```python
ckpt = "sdxl_lightning_4step_unet.safetensors"
# ... same setup ...

image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=0.0
).images[0]
```

## LCM-LoRA

Use with any SDXL model for fast generation:

```python
import torch
from diffusers import DiffusionPipeline, LCMScheduler

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")

# Load LCM-LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")

# Set LCM scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Generate in 4 steps
image = pipe(
    "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
    num_inference_steps=4,
    guidance_scale=1.0  # LCM uses low CFG
).images[0]

image.save("lcm_lora.png")
```

### With Custom LoRAs

```python
# Load base + LCM-LoRA + style LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl", adapter_name="lcm")
pipe.load_lora_weights("your-style-lora", adapter_name="style")

# Combine adapters
pipe.set_adapters(["lcm", "style"], adapter_weights=[1.0, 0.8])

image = pipe(prompt, num_inference_steps=4, guidance_scale=1.5).images[0]
```

## SD Turbo (SD 1.5)

For lower VRAM requirements:

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/sd-turbo",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")

image = pipe(
    "A photo of a cat",
    num_inference_steps=1,
    guidance_scale=0.0
).images[0]
```

## Image-to-Image

### SDXL Turbo Img2Img

```python
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image

pipe = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/sdxl-turbo",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe.to("cuda")

init_image = load_image("input.jpg").resize((512, 512))

image = pipe(
    prompt="cat wizard, gandalf, lord of the rings, detailed, fantasy",
    image=init_image,
    num_inference_steps=2,
    strength=0.5,
    guidance_scale=0.0
).images[0]
```

## Batch Generation

```python
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/sdxl-turbo",
    torch_dtype=torch.float16
).to("cuda")

prompts = [
    "A sunset over mountains",
    "A futuristic city at night",
    "A cute robot in a garden",
    "An ancient temple in fog"
]

# Batch generate
images = pipe(
    prompts,
    num_inference_steps=1,
    guidance_scale=0.0
).images

for i, img in enumerate(images):
    img.save(f"batch_{i}.png")
```

## Real-time Streaming

```python
import gradio as gr
import torch
from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    "stabilityai/sdxl-turbo",
    torch_dtype=torch.float16
).to("cuda")

def generate_realtime(prompt):
    if not prompt:
        return None
    image = pipe(
        prompt,
        num_inference_steps=1,
        guidance_scale=0.0,
        width=512,
        height=512
    ).images[0]
    return image

demo = gr.Interface(
    fn=generate_realtime,
    inputs=gr.Textbox(label="Prompt"),
    outputs=gr.Image(label="Generated"),
    live=True,  # Update as you type
    title="Real-time SDXL Turbo"
)

demo.launch(server_name="0.0.0.0", server_port=7860)
```

## Performance Comparison

| Model          | Steps | Resolution | RTX 3090 | RTX 4090 | A100  |
| -------------- | ----- | ---------- | -------- | -------- | ----- |
| SDXL (base)    | 30    | 1024x1024  | 8s       | 5s       | 4s    |
| SDXL Turbo     | 1     | 512x512    | 0.3s     | 0.2s     | 0.15s |
| SDXL Turbo     | 4     | 512x512    | 0.8s     | 0.5s     | 0.4s  |
| SDXL Lightning | 2     | 1024x1024  | 0.8s     | 0.5s     | 0.4s  |
| SDXL Lightning | 4     | 1024x1024  | 1.2s     | 0.8s     | 0.6s  |
| LCM-SDXL       | 4     | 1024x1024  | 1.5s     | 1.0s     | 0.7s  |

## Quality Comparison

| Aspect         | SDXL 30 steps | Turbo 4 steps | Lightning 4 steps |
| -------------- | ------------- | ------------- | ----------------- |
| Details        | Excellent     | Good          | Great             |
| Text rendering | Good          | Poor          | Poor              |
| Faces          | Great         | Good          | Good              |
| Consistency    | Excellent     | Good          | Great             |
| Style variety  | Excellent     | Good          | Great             |

## When to Use What

| Use Case          | Recommended    | Steps |
| ----------------- | -------------- | ----- |
| Real-time preview | SDXL Turbo     | 1     |
| Interactive apps  | SDXL Turbo     | 1-2   |
| Quick iterations  | SDXL Lightning | 2-4   |
| With custom LoRAs | LCM-LoRA       | 4-8   |
| Maximum quality   | SDXL Lightning | 8     |
| Low VRAM          | SD Turbo       | 1-2   |

## Cost Estimate

Typical CLORE.AI marketplace rates:

| GPU           | Hourly Rate | Images/Hour (1-step) |
| ------------- | ----------- | -------------------- |
| RTX 3060 12GB | \~$0.03     | \~3,000              |
| RTX 3090 24GB | \~$0.06     | \~8,000              |
| RTX 4090 24GB | \~$0.10     | \~12,000             |
| A100 40GB     | \~$0.17     | \~15,000             |

*Prices vary. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

## Troubleshooting

### Blurry Results

* SDXL Turbo outputs 512x512 natively
* Use SDXL Lightning for 1024x1024
* Add upscaling post-process

### guidance\_scale Error

```python
# SDXL Turbo: always use 0.0
image = pipe(prompt, guidance_scale=0.0).images[0]

# LCM: use 1.0-2.0
image = pipe(prompt, guidance_scale=1.5).images[0]

# Lightning: use 0.0
image = pipe(prompt, guidance_scale=0.0).images[0]
```

### LoRA Not Working

```python
# For LCM-LoRA, must use LCMScheduler
from diffusers import LCMScheduler

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
```

### Out of Memory

```python
# Enable memory optimizations
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

# Or use smaller model
# SD Turbo instead of SDXL Turbo
```

## Next Steps

* [FLUX.1](https://docs.clore.ai/guides/image-generation/flux) - Highest quality generation
* [Stable Diffusion WebUI](https://docs.clore.ai/guides/image-generation/stable-diffusion-webui) - Full UI
* [ComfyUI](https://docs.clore.ai/guides/image-generation/comfyui) - Node-based workflows
* [Real-ESRGAN](https://docs.clore.ai/guides/image-processing/real-esrgan-upscaling) - Upscale results
