# Stable Diffusion 3.5

Stable Diffusion 3.5 from Stability AI is a Multimodal Diffusion Transformer (MMDiT) that sets a new standard for open-weight image generation. It comes in three variants: **Large** (8B params), **Medium** (2.5B params), and **Large Turbo** (8B, distilled for 4-step inference). The standout feature is its accurate text rendering — SD 3.5 can reliably place readable text inside generated images, a capability most earlier models struggle with.

On [Clore.ai](https://clore.ai/) you can rent the GPU power SD 3.5 needs for as little as $0.30/day and generate hundreds of images per hour.

## Key Features

* **Three variants** — Large (8B, highest quality), Medium (2.5B, fast and light), Large Turbo (8B, 4-step distilled).
* **Accurate text rendering** — generates readable text, signs, labels, and typography within images.
* **MMDiT architecture** — joint image-text attention for superior prompt adherence.
* **1024×1024 native resolution** — clean output without upscaling hacks.
* **Flexible aspect ratios** — handles non-square outputs (768×1344, 1344×768, etc.) without quality loss.
* **Native diffusers support** — `StableDiffusion3Pipeline` in `diffusers >= 0.30`.
* **Open weights** — Stability AI Community License; free for most commercial use.

## Requirements

| Component  | Minimum        | Recommended           |
| ---------- | -------------- | --------------------- |
| GPU VRAM   | 12 GB (Medium) | 24 GB (Large / Turbo) |
| System RAM | 16 GB          | 32 GB                 |
| Disk       | 20 GB          | 40 GB                 |
| Python     | 3.10+          | 3.11                  |
| CUDA       | 12.1+          | 12.4                  |
| diffusers  | 0.30+          | latest                |

**Clore.ai GPU recommendation:** An **RTX 4090** (24 GB, \~$0.5–2/day) runs all three variants at full speed. For the Medium model, an **RTX 3090** (24 GB, \~$0.3–1/day) or even a 16 GB card is sufficient and cheaper.

## Quick Start

```bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install diffusers transformers accelerate sentencepiece protobuf

python -c "import torch; print(torch.cuda.get_device_name(0))"
```

## Usage Examples

### SD 3.5 Large — Maximum Quality

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

image = pipe(
    prompt=(
        "A weathered wooden sign reading 'OPEN 24 HOURS' hanging from "
        "a rusty chain outside a neon-lit diner, rainy night, reflections "
        "on wet asphalt, cinematic photography"
    ),
    negative_prompt="blurry, deformed text, low quality",
    guidance_scale=3.5,
    num_inference_steps=28,
    width=1024,
    height=1024,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("diner_sign.png")
print("Saved diner_sign.png")
```

### SD 3.5 Large Turbo — 4-Step Fast Generation

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large-turbo",
    torch_dtype=torch.bfloat16,
).to("cuda")

# Turbo variant: only 4 steps needed, guidance_scale=0 (distilled)
image = pipe(
    prompt="Macro photo of a mechanical watch movement, intricate gears, golden light",
    guidance_scale=0.0,
    num_inference_steps=4,
    width=1024,
    height=1024,
).images[0]

image.save("watch_turbo.png")
```

### SD 3.5 Medium — Lightweight Option

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-medium",
    torch_dtype=torch.float16,
).to("cuda")

image = pipe(
    prompt="Isometric view of a cozy coffee shop interior, pixel art style, warm lighting",
    guidance_scale=4.0,
    num_inference_steps=28,
    width=1024,
    height=1024,
).images[0]

image.save("coffee_shop_medium.png")
```

### Batch Generation with Different Aspect Ratios

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    torch_dtype=torch.bfloat16,
).to("cuda")

jobs = [
    {"prompt": "Portrait of an astronaut in a field of sunflowers", "w": 768, "h": 1344},
    {"prompt": "Panoramic landscape of Icelandic highlands, moody skies", "w": 1344, "h": 768},
    {"prompt": "Product photo of a perfume bottle on marble surface", "w": 1024, "h": 1024},
]

for i, job in enumerate(jobs):
    img = pipe(
        prompt=job["prompt"],
        guidance_scale=3.5,
        num_inference_steps=28,
        width=job["w"],
        height=job["h"],
    ).images[0]
    img.save(f"batch_{i:03d}.png")
    print(f"[{i+1}/{len(jobs)}] {job['w']}x{job['h']} done")
```

## Tips for Clore.ai Users

1. **Turbo for iteration, Large for finals** — use the 4-step Turbo variant to explore prompt ideas quickly, then switch to Large (28 steps) for the final render.
2. **guidance\_scale=3.5** — SD 3.5 Large works best at a lower CFG than older Stable Diffusion models. Going above 5.0 often causes oversaturation.
3. **Turbo needs guidance\_scale=0** — the distilled model already has guidance baked in; adding more degrades output.
4. **Text in images** — SD 3.5's text rendering is strong but not perfect. Use quotes around the exact text you want: `'OPEN 24 HOURS'`. Keep it short (3–5 words max).
5. **Cache weights** — set `HF_HOME=/workspace/hf_cache` on persistent storage. Large is \~16 GB on disk.
6. **bf16 for Large, fp16 for Medium** — the 8B models were trained in bf16; the 2.5B Medium runs fine in fp16.
7. **Batch efficiently** — SD 3.5 Large generates one 1024×1024 image in \~3 seconds on an RTX 4090. Batch overnight for mass generation.
8. **Accept HF license** — you must accept the model license on the HuggingFace model page before downloading. Log in with `huggingface-cli login`.

## Troubleshooting

| Problem                       | Fix                                                                                                                   |
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------- |
| `OutOfMemoryError` with Large | Use `pipe.enable_model_cpu_offload()`; or switch to Medium variant                                                    |
| Garbled text in image         | Keep text short (3–5 words); put it in quotes in the prompt; increase `num_inference_steps` to 35                     |
| Oversaturated colors          | Lower `guidance_scale` — try 2.5–3.5 for Large; use 0.0 for Turbo                                                     |
| 403 error downloading model   | Accept the license at `https://huggingface.co/stabilityai/stable-diffusion-3.5-large` and run `huggingface-cli login` |
| Slow first run                | Initial download is \~16 GB for Large; subsequent runs use cache                                                      |
| `KeyError: 'text_encoder_3'`  | Upgrade diffusers: `pip install -U diffusers transformers`                                                            |
| Black image output            | Ensure `torch_dtype=torch.bfloat16` for Large/Turbo; fp32 can cause silent failures on some cards                     |
