SD WebUI Forge

Deploy Stable Diffusion WebUI Forge with optimized VRAM management and FLUX support on Clore.ai GPUs

SD WebUI Forge is an optimized fork of the classic AUTOMATIC1111 Stable Diffusion WebUI, developed by the lllyasviel team. It delivers significantly better VRAM management (enabling SDXL on 4 GB GPUs), native FLUX model support, faster generation speeds, and full backward compatibility with all A1111 extensions and models. CLORE.AI's flexible GPU marketplace lets you pick the perfect GPU for Forge — from budget cards to top-tier A100s.

circle-check

Server Requirements

Parameter
Minimum
Recommended

RAM

8 GB

16 GB+

VRAM

4 GB

12 GB+

Disk

30 GB

200 GB+

GPU

NVIDIA GTX 1650 4GB+

RTX 3090, RTX 4090

circle-info

Forge's key advantage is its VRAM optimizer: it can run SDXL on as little as 4 GB VRAM (with slower speed). For FLUX models, 12 GB VRAM is the practical minimum, with 24 GB for full quality and speed.

Quick Deploy on CLORE.AI

Docker Image: nykk3/stable-diffusion-webui-forge:latest

Ports: 22/tcp, 7860/http

Environment Variables:

Variable
Example
Description

CLI_ARGS

--xformers --medvram

Extra CLI arguments

COMMANDLINE_ARGS

--api --listen

Alternative CLI args env

Step-by-Step Setup

1. Rent a GPU Server on CLORE.AI

Head to CLORE.AI Marketplacearrow-up-right:

  • Budget SD1.5: GTX 1660/2060 (6 GB) — plenty for 512/768px

  • SDXL capable: RTX 3080/3090 (10–24 GB)

  • FLUX capable: RTX 4090/A6000 (24+ GB)

  • Maximum quality: A100 80GB for batch generation

2. SSH into Your Server

3. Create Storage Directories

4. Pull and Run SD WebUI Forge

Standard launch:

With API enabled and extra performance flags:

Low VRAM mode (4-6 GB GPUs):

Maximum performance (24+ GB VRAM):

5. Monitor Startup

Look for:

Startup typically takes 2–5 minutes on first run.

6. Access the Web Interface

Your CLORE.AI http_pub URL for port 7860:

7. Add Models

Method 1: Download via CivitAI in the web UI

  • Go to Extensions → Installed → Models (some versions)

  • Or use the URL downloader in Settings

Method 2: Download directly on server

Method 3: HuggingFace CLI


Usage Examples

Example 1: Text to Image via Web UI

  1. Open the Forge UI at your CLORE.AI URL

  2. Select your model from the Checkpoint dropdown

  3. Enter prompt: "cinematic portrait of a warrior, golden hour, 8k photography"

  4. Set negative prompt: "blurry, low resolution, watermark, ugly"

  5. Set width/height: 1024x1024 for SDXL, 512x768 for SD1.5

  6. Set steps: 20–30, CFG: 7

  7. Click Generate

Example 2: FLUX Generation

FLUX models work differently — no negative prompt, higher quality:

  1. Select FLUX checkpoint (flux1-dev.safetensors)

  2. Under Forge, select appropriate Unet and VAE if separate files

  3. Enter prompt (no negative prompt needed):

  4. Steps: 20, CFG: 1.0 (FLUX uses lower CFG)

  5. Sampler: Euler or DPM++ 2M

Example 3: ControlNet-Guided Generation

  1. Install ControlNet extension (if not pre-installed):

    • Go to Extensions → Available → Load from

    • Search "ControlNet" and install

  2. Download ControlNet models to /root/sd-forge/models/ControlNet/

  3. In txt2img, expand ControlNet section

  4. Upload reference image (pose, depth, canny edge)

  5. Select preprocessor and model matching your reference type

  6. Generate — output follows the reference structure

Example 4: API Usage

With --api flag, Forge exposes a REST API:

Example 5: Batch Generation Script


Configuration

Key CLI Arguments

Argument
Description

--api

Enable REST API

--listen

Listen on all interfaces (required for CLORE.AI)

--port 7860

Change port

--xformers

Enable xFormers attention (faster, less VRAM)

--medvram

Medium VRAM mode (SD1.5 on 6GB)

--medvram-sdxl

Medium VRAM for SDXL (SDXL on 8GB)

--lowvram

Low VRAM mode (very slow, any GPU)

--no-half

Use float32 (more VRAM, more stable)

--no-half-vae

Keep VAE in float32 (prevents black images)

--opt-sdp-attention

PyTorch scaled dot product attention

--enable-insecure-extension-access

Allow extension installation

--skip-version-check

Skip Python/torch version checks

Forge-Specific Settings

Forge adds a Forge panel in the UI with:

  • Forge Unet: Select optimization backend (default, bnb, etc.)

  • Diffusers Torch Compilation: Enable for 20-30% faster generation (first run compiles)

  • GPU Weights: How much to keep on GPU vs CPU


Performance Tips

1. Use xFormers for 20-30% Less VRAM

Automatically improves performance on most GPUs.

2. Forge's VRAM Optimizer

Forge automatically manages VRAM better than A1111. Just use the --medvram-sdxl flag for SDXL on 8-12 GB GPUs and let it handle the rest.

3. Enable Torch Compilation (Ampere+)

In the Forge tab in the UI, enable Diffusers Torch Compilation. First generation takes 2-3 minutes to compile, but subsequent ones are 20-30% faster.

4. Optimal Steps/Sampler Combos

Goal
Sampler
Steps
CFG

Speed

DPM++ SDE Karras

15-20

7

Quality

DPM++ 2M Karras

25-35

7

Artistic

Euler a

20-30

5-7

FLUX

Euler

20

1

5. Use Tile VAE for 2K+ Resolutions

For ultra-high resolution (2048×2048+), enable Tiled VAE in the SD tab to prevent VAE OOM errors.

6. Batch Locally with API

Instead of generating one at a time in the UI, use the API with batch_size for faster throughput:


Troubleshooting

Problem: Black or green images

VAE precision issue. Add flag:

Or use the sdxl-vae-fp16-fix.safetensors VAE.

Problem: "CUDA out of memory"

Try in order:

  1. --medvram-sdxl (for SDXL)

  2. --medvram (for SD1.5)

  3. Reduce image resolution

  4. --lowvram (last resort, very slow)

Problem: Extensions not loading

Then install from Extensions tab in the UI.

Problem: Startup takes too long

Normal on first run — PyTorch and model hashes are computed. Subsequent starts are faster.

Problem: Can't access UI from browser

Ensure the Forge process binds to 0.0.0.0:

  • Add --listen to CLI_ARGS

  • Verify port 7860 is in your CLORE.AI order port list

Problem: Model not showing in dropdown

After placing .safetensors files in the correct folder, click 🔄 Refresh next to the Checkpoint dropdown.



Clore.ai GPU Recommendations

Use Case
Recommended GPU
Est. Cost on Clore.ai

Development/Testing

RTX 3090 (24GB)

~$0.12/gpu/hr

Production

RTX 4090 (24GB)

~$0.70/gpu/hr

Large Scale

A100 80GB

~$1.20/gpu/hr

💡 All examples in this guide can be deployed on Clore.aiarrow-up-right GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.

Last updated

Was this helpful?