# SD WebUI Forge

SD WebUI Forge is an optimized fork of the classic AUTOMATIC1111 Stable Diffusion WebUI, developed by the lllyasviel team. It delivers significantly better VRAM management (enabling SDXL on 4 GB GPUs), native FLUX model support, faster generation speeds, and full backward compatibility with all A1111 extensions and models. CLORE.AI's flexible GPU marketplace lets you pick the perfect GPU for Forge — from budget cards to top-tier A100s.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Server Requirements

| Parameter | Minimum              | Recommended        |
| --------- | -------------------- | ------------------ |
| RAM       | 8 GB                 | 16 GB+             |
| VRAM      | 4 GB                 | 12 GB+             |
| Disk      | 30 GB                | 200 GB+            |
| GPU       | NVIDIA GTX 1650 4GB+ | RTX 3090, RTX 4090 |

{% hint style="info" %}
Forge's key advantage is its VRAM optimizer: it can run SDXL on as little as 4 GB VRAM (with slower speed). For FLUX models, 12 GB VRAM is the practical minimum, with 24 GB for full quality and speed.
{% endhint %}

## Quick Deploy on CLORE.AI

**Docker Image:** `nykk3/stable-diffusion-webui-forge:latest`

**Ports:** `22/tcp`, `7860/http`

**Environment Variables:**

| Variable           | Example                | Description              |
| ------------------ | ---------------------- | ------------------------ |
| `CLI_ARGS`         | `--xformers --medvram` | Extra CLI arguments      |
| `COMMANDLINE_ARGS` | `--api --listen`       | Alternative CLI args env |

## Step-by-Step Setup

### 1. Rent a GPU Server on CLORE.AI

Head to [CLORE.AI Marketplace](https://clore.ai/marketplace):

* **Budget SD1.5**: GTX 1660/2060 (6 GB) — plenty for 512/768px
* **SDXL capable**: RTX 3080/3090 (10–24 GB)
* **FLUX capable**: RTX 4090/A6000 (24+ GB)
* **Maximum quality**: A100 80GB for batch generation

### 2. SSH into Your Server

```bash
ssh -p <PORT> root@<SERVER_IP>
```

### 3. Create Storage Directories

```bash
mkdir -p /root/sd-forge/{models,outputs,extensions,configs}
mkdir -p /root/sd-forge/models/{Stable-diffusion,VAE,Lora,ControlNet,embeddings,ESRGAN}
```

### 4. Pull and Run SD WebUI Forge

**Standard launch:**

```bash
docker run -d \
  --name sd-forge \
  --gpus all \
  -p 7860:7860 \
  -v /root/sd-forge/models:/app/stable-diffusion-webui/models \
  -v /root/sd-forge/outputs:/app/stable-diffusion-webui/outputs \
  -v /root/sd-forge/extensions:/app/stable-diffusion-webui/extensions \
  nykk3/stable-diffusion-webui-forge:latest
```

**With API enabled and extra performance flags:**

```bash
docker run -d \
  --name sd-forge \
  --gpus all \
  -p 7860:7860 \
  -v /root/sd-forge/models:/app/stable-diffusion-webui/models \
  -v /root/sd-forge/outputs:/app/stable-diffusion-webui/outputs \
  -v /root/sd-forge/extensions:/app/stable-diffusion-webui/extensions \
  -e CLI_ARGS="--api --xformers --enable-insecure-extension-access" \
  nykk3/stable-diffusion-webui-forge:latest
```

**Low VRAM mode (4-6 GB GPUs):**

```bash
docker run -d \
  --name sd-forge \
  --gpus all \
  -p 7860:7860 \
  -v /root/sd-forge/models:/app/stable-diffusion-webui/models \
  -v /root/sd-forge/outputs:/app/stable-diffusion-webui/outputs \
  -e CLI_ARGS="--api --medvram-sdxl --opt-sdp-attention" \
  nykk3/stable-diffusion-webui-forge:latest
```

**Maximum performance (24+ GB VRAM):**

```bash
docker run -d \
  --name sd-forge \
  --gpus all \
  -p 7860:7860 \
  -v /root/sd-forge/models:/app/stable-diffusion-webui/models \
  -v /root/sd-forge/outputs:/app/stable-diffusion-webui/outputs \
  -e CLI_ARGS="--api --xformers --no-half-vae" \
  nykk3/stable-diffusion-webui-forge:latest
```

### 5. Monitor Startup

```bash
docker logs -f sd-forge
```

Look for:

```
Running on local URL:  http://0.0.0.0:7860
```

Startup typically takes 2–5 minutes on first run.

### 6. Access the Web Interface

Your CLORE.AI http\_pub URL for port 7860:

```
https://<order-id>-7860.clore.ai/
```

### 7. Add Models

**Method 1: Download via CivitAI in the web UI**

* Go to **Extensions → Installed → Models** (some versions)
* Or use the URL downloader in Settings

**Method 2: Download directly on server**

```bash
# Download SDXL base model
cd /root/sd-forge/models/Stable-diffusion
wget -O "sd_xl_base_1.0.safetensors" \
  "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors"

# Download FLUX.1-schnell (fast FLUX model)
wget -O "flux1-schnell.safetensors" \
  "https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors"
```

**Method 3: HuggingFace CLI**

```bash
docker exec -it sd-forge bash -c "
pip install huggingface_hub
huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 \
  sd_xl_base_1.0.safetensors \
  --local-dir /app/stable-diffusion-webui/models/Stable-diffusion/
"
```

***

## Usage Examples

### Example 1: Text to Image via Web UI

1. Open the Forge UI at your CLORE.AI URL
2. Select your model from the **Checkpoint** dropdown
3. Enter prompt: `"cinematic portrait of a warrior, golden hour, 8k photography"`
4. Set negative prompt: `"blurry, low resolution, watermark, ugly"`
5. Set width/height: `1024x1024` for SDXL, `512x768` for SD1.5
6. Set steps: 20–30, CFG: 7
7. Click **Generate**

### Example 2: FLUX Generation

FLUX models work differently — no negative prompt, higher quality:

1. Select FLUX checkpoint (flux1-dev.safetensors)
2. Under **Forge**, select appropriate **Unet** and **VAE** if separate files
3. Enter prompt (no negative prompt needed):

   ```
   A breathtaking landscape at sunset, mountains reflected in a pristine lake,
   photorealistic, ultra detailed, professional photography
   ```
4. Steps: 20, CFG: 1.0 (FLUX uses lower CFG)
5. Sampler: `Euler` or `DPM++ 2M`

### Example 3: ControlNet-Guided Generation

1. Install ControlNet extension (if not pre-installed):
   * Go to **Extensions → Available → Load from**
   * Search "ControlNet" and install
2. Download ControlNet models to `/root/sd-forge/models/ControlNet/`
3. In txt2img, expand **ControlNet** section
4. Upload reference image (pose, depth, canny edge)
5. Select preprocessor and model matching your reference type
6. Generate — output follows the reference structure

### Example 4: API Usage

With `--api` flag, Forge exposes a REST API:

```python
import requests
import base64
import io
from PIL import Image

BASE_URL = "http://localhost:7860"  # or CLORE.AI http_pub URL

# Text to Image
payload = {
    "prompt": "a serene Japanese garden with cherry blossoms, watercolor style",
    "negative_prompt": "ugly, blurry, low quality",
    "steps": 25,
    "cfg_scale": 7,
    "width": 1024,
    "height": 1024,
    "sampler_name": "DPM++ 2M",
    "batch_size": 1,
}

response = requests.post(f"{BASE_URL}/sdapi/v1/txt2img", json=payload)
result = response.json()

# Save the image
for i, img_b64 in enumerate(result["images"]):
    img_data = base64.b64decode(img_b64)
    img = Image.open(io.BytesIO(img_data))
    img.save(f"output_{i}.png")
    print(f"Saved output_{i}.png")
```

### Example 5: Batch Generation Script

```python
import requests
import base64
import io
from PIL import Image
import os

BASE_URL = "http://localhost:7860"

prompts = [
    ("cyberpunk city at night, neon lights, rain", "cyberpunk"),
    ("ancient forest, mystical fog, fantasy art", "fantasy"),
    ("minimalist logo design, geometric shapes, white background", "logo"),
    ("portrait of an elderly sailor, weathered face, oil painting", "portrait"),
]

os.makedirs("batch_output", exist_ok=True)

for prompt_text, filename in prompts:
    print(f"Generating: {filename}...")
    response = requests.post(
        f"{BASE_URL}/sdapi/v1/txt2img",
        json={
            "prompt": prompt_text,
            "negative_prompt": "low quality, blurry, watermark",
            "steps": 25,
            "cfg_scale": 7,
            "width": 1024,
            "height": 1024,
        },
    )
    
    if response.status_code == 200:
        img_b64 = response.json()["images"][0]
        img = Image.open(io.BytesIO(base64.b64decode(img_b64)))
        img.save(f"batch_output/{filename}.png")
        print(f"  Saved batch_output/{filename}.png")
    else:
        print(f"  Error: {response.status_code}")
```

***

## Configuration

### Key CLI Arguments

| Argument                             | Description                                      |
| ------------------------------------ | ------------------------------------------------ |
| `--api`                              | Enable REST API                                  |
| `--listen`                           | Listen on all interfaces (required for CLORE.AI) |
| `--port 7860`                        | Change port                                      |
| `--xformers`                         | Enable xFormers attention (faster, less VRAM)    |
| `--medvram`                          | Medium VRAM mode (SD1.5 on 6GB)                  |
| `--medvram-sdxl`                     | Medium VRAM for SDXL (SDXL on 8GB)               |
| `--lowvram`                          | Low VRAM mode (very slow, any GPU)               |
| `--no-half`                          | Use float32 (more VRAM, more stable)             |
| `--no-half-vae`                      | Keep VAE in float32 (prevents black images)      |
| `--opt-sdp-attention`                | PyTorch scaled dot product attention             |
| `--enable-insecure-extension-access` | Allow extension installation                     |
| `--skip-version-check`               | Skip Python/torch version checks                 |

### Forge-Specific Settings

Forge adds a **Forge** panel in the UI with:

* **Forge Unet**: Select optimization backend (default, bnb, etc.)
* **Diffusers Torch Compilation**: Enable for 20-30% faster generation (first run compiles)
* **GPU Weights**: How much to keep on GPU vs CPU

***

## Performance Tips

### 1. Use xFormers for 20-30% Less VRAM

```bash
--xformers
```

Automatically improves performance on most GPUs.

### 2. Forge's VRAM Optimizer

Forge automatically manages VRAM better than A1111. Just use the `--medvram-sdxl` flag for SDXL on 8-12 GB GPUs and let it handle the rest.

### 3. Enable Torch Compilation (Ampere+)

In the Forge tab in the UI, enable **Diffusers Torch Compilation**. First generation takes 2-3 minutes to compile, but subsequent ones are 20-30% faster.

### 4. Optimal Steps/Sampler Combos

| Goal     | Sampler            | Steps | CFG |
| -------- | ------------------ | ----- | --- |
| Speed    | `DPM++ SDE Karras` | 15-20 | 7   |
| Quality  | `DPM++ 2M Karras`  | 25-35 | 7   |
| Artistic | `Euler a`          | 20-30 | 5-7 |
| FLUX     | `Euler`            | 20    | 1   |

### 5. Use Tile VAE for 2K+ Resolutions

For ultra-high resolution (2048×2048+), enable **Tiled VAE** in the SD tab to prevent VAE OOM errors.

### 6. Batch Locally with API

Instead of generating one at a time in the UI, use the API with `batch_size` for faster throughput:

```python
payload = {
    "prompt": "...",
    "batch_size": 4,  # Generate 4 images at once
    "n_iter": 2,      # Run 2 iterations = 8 total images
}
```

***

## Troubleshooting

### Problem: Black or green images

VAE precision issue. Add flag:

```bash
--no-half-vae
```

Or use the `sdxl-vae-fp16-fix.safetensors` VAE.

### Problem: "CUDA out of memory"

Try in order:

1. `--medvram-sdxl` (for SDXL)
2. `--medvram` (for SD1.5)
3. Reduce image resolution
4. `--lowvram` (last resort, very slow)

### Problem: Extensions not loading

```bash
# Allow extension access
-e CLI_ARGS="--enable-insecure-extension-access"
```

Then install from Extensions tab in the UI.

### Problem: Startup takes too long

Normal on first run — PyTorch and model hashes are computed. Subsequent starts are faster.

```bash
docker logs -f sd-forge  # Watch progress
```

### Problem: Can't access UI from browser

Ensure the Forge process binds to `0.0.0.0`:

* Add `--listen` to CLI\_ARGS
* Verify port 7860 is in your CLORE.AI order port list

### Problem: Model not showing in dropdown

After placing `.safetensors` files in the correct folder, click **🔄 Refresh** next to the Checkpoint dropdown.

***

## Links

* [GitHub (Forge)](https://github.com/lllyasviel/stable-diffusion-webui-forge)
* [GitHub (A1111 base)](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
* [Docker Hub (nykk3)](https://hub.docker.com/r/nykk3/stable-diffusion-webui-forge)
* [CivitAI (Models)](https://civitai.com)
* [FLUX Models](https://huggingface.co/black-forest-labs)
* [CLORE.AI Marketplace](https://clore.ai/marketplace)

***

## Clore.ai GPU Recommendations

| Use Case            | Recommended GPU | Est. Cost on Clore.ai |
| ------------------- | --------------- | --------------------- |
| Development/Testing | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| Production          | RTX 4090 (24GB) | \~$0.70/gpu/hr        |
| Large Scale         | A100 80GB       | \~$1.20/gpu/hr        |

> 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.
