Stable Diffusion 3.5
Generate high-fidelity images with accurate text rendering using Stable Diffusion 3.5 on Clore.ai GPUs.
Stable Diffusion 3.5 from Stability AI is a Multimodal Diffusion Transformer (MMDiT) that sets a new standard for open-weight image generation. It comes in three variants: Large (8B params), Medium (2.5B params), and Large Turbo (8B, distilled for 4-step inference). The standout feature is its accurate text rendering — SD 3.5 can reliably place readable text inside generated images, a capability most earlier models struggle with.
On Clore.ai you can rent the GPU power SD 3.5 needs for as little as $0.30/day and generate hundreds of images per hour.
Key Features
Three variants — Large (8B, highest quality), Medium (2.5B, fast and light), Large Turbo (8B, 4-step distilled).
Accurate text rendering — generates readable text, signs, labels, and typography within images.
MMDiT architecture — joint image-text attention for superior prompt adherence.
1024×1024 native resolution — clean output without upscaling hacks.
Flexible aspect ratios — handles non-square outputs (768×1344, 1344×768, etc.) without quality loss.
Native diffusers support —
StableDiffusion3Pipelineindiffusers >= 0.30.Open weights — Stability AI Community License; free for most commercial use.
Requirements
GPU VRAM
12 GB (Medium)
24 GB (Large / Turbo)
System RAM
16 GB
32 GB
Disk
20 GB
40 GB
Python
3.10+
3.11
CUDA
12.1+
12.4
diffusers
0.30+
latest
Clore.ai GPU recommendation: An RTX 4090 (24 GB, ~$0.5–2/day) runs all three variants at full speed. For the Medium model, an RTX 3090 (24 GB, ~$0.3–1/day) or even a 16 GB card is sufficient and cheaper.
Quick Start
Usage Examples
SD 3.5 Large — Maximum Quality
SD 3.5 Large Turbo — 4-Step Fast Generation
SD 3.5 Medium — Lightweight Option
Batch Generation with Different Aspect Ratios
Tips for Clore.ai Users
Turbo for iteration, Large for finals — use the 4-step Turbo variant to explore prompt ideas quickly, then switch to Large (28 steps) for the final render.
guidance_scale=3.5 — SD 3.5 Large works best at a lower CFG than older Stable Diffusion models. Going above 5.0 often causes oversaturation.
Turbo needs guidance_scale=0 — the distilled model already has guidance baked in; adding more degrades output.
Text in images — SD 3.5's text rendering is strong but not perfect. Use quotes around the exact text you want:
'OPEN 24 HOURS'. Keep it short (3–5 words max).Cache weights — set
HF_HOME=/workspace/hf_cacheon persistent storage. Large is ~16 GB on disk.bf16 for Large, fp16 for Medium — the 8B models were trained in bf16; the 2.5B Medium runs fine in fp16.
Batch efficiently — SD 3.5 Large generates one 1024×1024 image in ~3 seconds on an RTX 4090. Batch overnight for mass generation.
Accept HF license — you must accept the model license on the HuggingFace model page before downloading. Log in with
huggingface-cli login.
Troubleshooting
OutOfMemoryError with Large
Use pipe.enable_model_cpu_offload(); or switch to Medium variant
Garbled text in image
Keep text short (3–5 words); put it in quotes in the prompt; increase num_inference_steps to 35
Oversaturated colors
Lower guidance_scale — try 2.5–3.5 for Large; use 0.0 for Turbo
403 error downloading model
Accept the license at https://huggingface.co/stabilityai/stable-diffusion-3.5-large and run huggingface-cli login
Slow first run
Initial download is ~16 GB for Large; subsequent runs use cache
KeyError: 'text_encoder_3'
Upgrade diffusers: pip install -U diffusers transformers
Black image output
Ensure torch_dtype=torch.bfloat16 for Large/Turbo; fp32 can cause silent failures on some cards
Last updated
Was this helpful?