# Stable Diffusion 3.5

Stable Diffusion 3.5 from Stability AI is a Multimodal Diffusion Transformer (MMDiT) that sets a new standard for open-weight image generation. It comes in three variants: **Large** (8B params), **Medium** (2.5B params), and **Large Turbo** (8B, distilled for 4-step inference). The standout feature is its accurate text rendering — SD 3.5 can reliably place readable text inside generated images, a capability most earlier models struggle with.

On [Clore.ai](https://clore.ai/) you can rent the GPU power SD 3.5 needs for as little as $0.30/day and generate hundreds of images per hour.

## Key Features

* **Three variants** — Large (8B, highest quality), Medium (2.5B, fast and light), Large Turbo (8B, 4-step distilled).
* **Accurate text rendering** — generates readable text, signs, labels, and typography within images.
* **MMDiT architecture** — joint image-text attention for superior prompt adherence.
* **1024×1024 native resolution** — clean output without upscaling hacks.
* **Flexible aspect ratios** — handles non-square outputs (768×1344, 1344×768, etc.) without quality loss.
* **Native diffusers support** — `StableDiffusion3Pipeline` in `diffusers >= 0.30`.
* **Open weights** — Stability AI Community License; free for most commercial use.

## Requirements

| Component  | Minimum        | Recommended           |
| ---------- | -------------- | --------------------- |
| GPU VRAM   | 12 GB (Medium) | 24 GB (Large / Turbo) |
| System RAM | 16 GB          | 32 GB                 |
| Disk       | 20 GB          | 40 GB                 |
| Python     | 3.10+          | 3.11                  |
| CUDA       | 12.1+          | 12.4                  |
| diffusers  | 0.30+          | latest                |

**Clore.ai GPU recommendation:** An **RTX 4090** (24 GB, \~$0.5–2/day) runs all three variants at full speed. For the Medium model, an **RTX 3090** (24 GB, \~$0.3–1/day) or even a 16 GB card is sufficient and cheaper.

## Quick Start

```bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install diffusers transformers accelerate sentencepiece protobuf

python -c "import torch; print(torch.cuda.get_device_name(0))"
```

## Usage Examples

### SD 3.5 Large — Maximum Quality

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

image = pipe(
    prompt=(
        "A weathered wooden sign reading 'OPEN 24 HOURS' hanging from "
        "a rusty chain outside a neon-lit diner, rainy night, reflections "
        "on wet asphalt, cinematic photography"
    ),
    negative_prompt="blurry, deformed text, low quality",
    guidance_scale=3.5,
    num_inference_steps=28,
    width=1024,
    height=1024,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("diner_sign.png")
print("Saved diner_sign.png")
```

### SD 3.5 Large Turbo — 4-Step Fast Generation

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large-turbo",
    torch_dtype=torch.bfloat16,
).to("cuda")

# Turbo variant: only 4 steps needed, guidance_scale=0 (distilled)
image = pipe(
    prompt="Macro photo of a mechanical watch movement, intricate gears, golden light",
    guidance_scale=0.0,
    num_inference_steps=4,
    width=1024,
    height=1024,
).images[0]

image.save("watch_turbo.png")
```

### SD 3.5 Medium — Lightweight Option

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-medium",
    torch_dtype=torch.float16,
).to("cuda")

image = pipe(
    prompt="Isometric view of a cozy coffee shop interior, pixel art style, warm lighting",
    guidance_scale=4.0,
    num_inference_steps=28,
    width=1024,
    height=1024,
).images[0]

image.save("coffee_shop_medium.png")
```

### Batch Generation with Different Aspect Ratios

```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    torch_dtype=torch.bfloat16,
).to("cuda")

jobs = [
    {"prompt": "Portrait of an astronaut in a field of sunflowers", "w": 768, "h": 1344},
    {"prompt": "Panoramic landscape of Icelandic highlands, moody skies", "w": 1344, "h": 768},
    {"prompt": "Product photo of a perfume bottle on marble surface", "w": 1024, "h": 1024},
]

for i, job in enumerate(jobs):
    img = pipe(
        prompt=job["prompt"],
        guidance_scale=3.5,
        num_inference_steps=28,
        width=job["w"],
        height=job["h"],
    ).images[0]
    img.save(f"batch_{i:03d}.png")
    print(f"[{i+1}/{len(jobs)}] {job['w']}x{job['h']} done")
```

## Tips for Clore.ai Users

1. **Turbo for iteration, Large for finals** — use the 4-step Turbo variant to explore prompt ideas quickly, then switch to Large (28 steps) for the final render.
2. **guidance\_scale=3.5** — SD 3.5 Large works best at a lower CFG than older Stable Diffusion models. Going above 5.0 often causes oversaturation.
3. **Turbo needs guidance\_scale=0** — the distilled model already has guidance baked in; adding more degrades output.
4. **Text in images** — SD 3.5's text rendering is strong but not perfect. Use quotes around the exact text you want: `'OPEN 24 HOURS'`. Keep it short (3–5 words max).
5. **Cache weights** — set `HF_HOME=/workspace/hf_cache` on persistent storage. Large is \~16 GB on disk.
6. **bf16 for Large, fp16 for Medium** — the 8B models were trained in bf16; the 2.5B Medium runs fine in fp16.
7. **Batch efficiently** — SD 3.5 Large generates one 1024×1024 image in \~3 seconds on an RTX 4090. Batch overnight for mass generation.
8. **Accept HF license** — you must accept the model license on the HuggingFace model page before downloading. Log in with `huggingface-cli login`.

## Troubleshooting

| Problem                       | Fix                                                                                                                   |
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------- |
| `OutOfMemoryError` with Large | Use `pipe.enable_model_cpu_offload()`; or switch to Medium variant                                                    |
| Garbled text in image         | Keep text short (3–5 words); put it in quotes in the prompt; increase `num_inference_steps` to 35                     |
| Oversaturated colors          | Lower `guidance_scale` — try 2.5–3.5 for Large; use 0.0 for Turbo                                                     |
| 403 error downloading model   | Accept the license at `https://huggingface.co/stabilityai/stable-diffusion-3.5-large` and run `huggingface-cli login` |
| Slow first run                | Initial download is \~16 GB for Large; subsequent runs use cache                                                      |
| `KeyError: 'text_encoder_3'`  | Upgrade diffusers: `pip install -U diffusers transformers`                                                            |
| Black image output            | Ensure `torch_dtype=torch.bfloat16` for Large/Turbo; fp32 can cause silent failures on some cards                     |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/image-generation/stable-diffusion-3-5.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
