Stable Diffusion 3.5

Generate high-fidelity images with accurate text rendering using Stable Diffusion 3.5 on Clore.ai GPUs.

Stable Diffusion 3.5 from Stability AI is a Multimodal Diffusion Transformer (MMDiT) that sets a new standard for open-weight image generation. It comes in three variants: Large (8B params), Medium (2.5B params), and Large Turbo (8B, distilled for 4-step inference). The standout feature is its accurate text rendering — SD 3.5 can reliably place readable text inside generated images, a capability most earlier models struggle with.

On Clore.aiarrow-up-right you can rent the GPU power SD 3.5 needs for as little as $0.30/day and generate hundreds of images per hour.

Key Features

  • Three variants — Large (8B, highest quality), Medium (2.5B, fast and light), Large Turbo (8B, 4-step distilled).

  • Accurate text rendering — generates readable text, signs, labels, and typography within images.

  • MMDiT architecture — joint image-text attention for superior prompt adherence.

  • 1024×1024 native resolution — clean output without upscaling hacks.

  • Flexible aspect ratios — handles non-square outputs (768×1344, 1344×768, etc.) without quality loss.

  • Native diffusers supportStableDiffusion3Pipeline in diffusers >= 0.30.

  • Open weights — Stability AI Community License; free for most commercial use.

Requirements

Component
Minimum
Recommended

GPU VRAM

12 GB (Medium)

24 GB (Large / Turbo)

System RAM

16 GB

32 GB

Disk

20 GB

40 GB

Python

3.10+

3.11

CUDA

12.1+

12.4

diffusers

0.30+

latest

Clore.ai GPU recommendation: An RTX 4090 (24 GB, ~$0.5–2/day) runs all three variants at full speed. For the Medium model, an RTX 3090 (24 GB, ~$0.3–1/day) or even a 16 GB card is sufficient and cheaper.

Quick Start

Usage Examples

SD 3.5 Large — Maximum Quality

SD 3.5 Large Turbo — 4-Step Fast Generation

SD 3.5 Medium — Lightweight Option

Batch Generation with Different Aspect Ratios

Tips for Clore.ai Users

  1. Turbo for iteration, Large for finals — use the 4-step Turbo variant to explore prompt ideas quickly, then switch to Large (28 steps) for the final render.

  2. guidance_scale=3.5 — SD 3.5 Large works best at a lower CFG than older Stable Diffusion models. Going above 5.0 often causes oversaturation.

  3. Turbo needs guidance_scale=0 — the distilled model already has guidance baked in; adding more degrades output.

  4. Text in images — SD 3.5's text rendering is strong but not perfect. Use quotes around the exact text you want: 'OPEN 24 HOURS'. Keep it short (3–5 words max).

  5. Cache weights — set HF_HOME=/workspace/hf_cache on persistent storage. Large is ~16 GB on disk.

  6. bf16 for Large, fp16 for Medium — the 8B models were trained in bf16; the 2.5B Medium runs fine in fp16.

  7. Batch efficiently — SD 3.5 Large generates one 1024×1024 image in ~3 seconds on an RTX 4090. Batch overnight for mass generation.

  8. Accept HF license — you must accept the model license on the HuggingFace model page before downloading. Log in with huggingface-cli login.

Troubleshooting

Problem
Fix

OutOfMemoryError with Large

Use pipe.enable_model_cpu_offload(); or switch to Medium variant

Garbled text in image

Keep text short (3–5 words); put it in quotes in the prompt; increase num_inference_steps to 35

Oversaturated colors

Lower guidance_scale — try 2.5–3.5 for Large; use 0.0 for Turbo

403 error downloading model

Accept the license at https://huggingface.co/stabilityai/stable-diffusion-3.5-large and run huggingface-cli login

Slow first run

Initial download is ~16 GB for Large; subsequent runs use cache

KeyError: 'text_encoder_3'

Upgrade diffusers: pip install -U diffusers transformers

Black image output

Ensure torch_dtype=torch.bfloat16 for Large/Turbo; fp32 can cause silent failures on some cards

Last updated

Was this helpful?