# ControlNet

Master ControlNet for precise control over AI image generation.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is ControlNet?

ControlNet adds conditional control to Stable Diffusion:

* **Canny** - Edge detection
* **Depth** - 3D depth maps
* **Pose** - Human poses
* **Scribble** - Rough sketches
* **Segmentation** - Semantic masks
* **Line Art** - Clean lines
* **IP-Adapter** - Style transfer

## Requirements

| Control Type      | Min VRAM | Recommended |
| ----------------- | -------- | ----------- |
| Single ControlNet | 8GB      | RTX 3070    |
| Multi ControlNet  | 12GB     | RTX 3090    |
| SDXL ControlNet   | 16GB     | RTX 4090    |

## Quick Deploy with A1111

**Command:**

```bash
cd /workspace/stable-diffusion-webui && \
cd extensions && \
git clone https://github.com/Mikubill/sd-webui-controlnet && \
cd .. && \
python launch.py --listen --enable-insecure-extension-access
```

### Download Models

```bash
cd /workspace/stable-diffusion-webui/extensions/sd-webui-controlnet/models

# SD 1.5 ControlNets
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.pth

# SDXL ControlNets
wget https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0/resolve/main/diffusion_pytorch_model.safetensors -O controlnet-canny-sdxl.safetensors
```

## Python with Diffusers

### Canny Edge Control

```python
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
import cv2
import numpy as np

# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

# Load pipeline
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# Prepare control image
image = load_image("input.jpg")
canny = CannyDetector()
control_image = canny(image)

# Generate
output = pipe(
    prompt="a beautiful woman in a garden, high quality",
    negative_prompt="ugly, blurry",
    image=control_image,
    num_inference_steps=30,
    controlnet_conditioning_scale=1.0
).images[0]

output.save("canny_output.png")
```

### Depth Control

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import MidasDetector
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1p_sd15_depth",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

# Get depth map
depth_estimator = MidasDetector.from_pretrained("lllyasviel/Annotators")
depth_image = depth_estimator(image)

# Generate with depth
output = pipe(
    prompt="a futuristic city, sci-fi, detailed",
    image=depth_image,
    num_inference_steps=30
).images[0]
```

### OpenPose (Human Poses)

```python
from controlnet_aux import OpenposeDetector

# Get pose
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
pose_image = pose_detector(image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_openpose",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a ballerina dancing, elegant, studio lighting",
    image=pose_image,
    num_inference_steps=30
).images[0]
```

### Scribble/Sketch

```python
from controlnet_aux import HEDdetector

# Detect edges as scribble
hed = HEDdetector.from_pretrained("lllyasviel/Annotators")
scribble_image = hed(image, scribble=True)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_scribble",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a detailed painting of a landscape",
    image=scribble_image,
    num_inference_steps=30
).images[0]
```

## Multi-ControlNet

Combine multiple controls:

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

# Load multiple ControlNets
controlnet_canny = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

controlnet_depth = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1p_sd15_depth",
    torch_dtype=torch.float16
)

# Create pipeline with multiple ControlNets
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=[controlnet_canny, controlnet_depth],
    torch_dtype=torch.float16
).to("cuda")

# Generate with multiple controls
output = pipe(
    prompt="a beautiful portrait",
    image=[canny_image, depth_image],
    controlnet_conditioning_scale=[1.0, 0.8],  # Adjust weights
    num_inference_steps=30
).images[0]
```

## SDXL ControlNet

```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
import torch

# Load SDXL ControlNet
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

# Prepare canny image
canny = CannyDetector()
control_image = canny(image, low_threshold=100, high_threshold=200)

output = pipe(
    prompt="a professional photograph, detailed, 8k",
    image=control_image,
    controlnet_conditioning_scale=0.5,
    num_inference_steps=30
).images[0]
```

## IP-Adapter (Style Transfer)

```python
from diffusers import StableDiffusionPipeline
from transformers import CLIPVisionModelWithProjection
import torch

# Load IP-Adapter
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

pipe.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="models",
    weight_name="ip-adapter_sd15.bin"
)

pipe.set_ip_adapter_scale(0.6)

# Style reference image
style_image = load_image("style_reference.jpg")

output = pipe(
    prompt="a cat sitting on a chair",
    ip_adapter_image=style_image,
    num_inference_steps=30
).images[0]
```

## Preprocessors

All available preprocessors:

```python
from controlnet_aux import (
    CannyDetector,           # Edge detection
    HEDdetector,             # Soft edge/scribble
    MidasDetector,           # Depth estimation
    OpenposeDetector,        # Human pose
    MLSDdetector,            # Line detection
    LineartDetector,         # Line art
    LineartAnimeDetector,    # Anime line art
    NormalBaeDetector,       # Normal maps
    ContentShuffleDetector,  # Shuffle content
    ZoeDetector,             # Better depth
    MediapipeFaceDetector,   # Face mesh
)

# Example usage
canny = CannyDetector()
canny_image = canny(image, low_threshold=100, high_threshold=200)

depth = MidasDetector.from_pretrained("lllyasviel/Annotators")
depth_image = depth(image)

pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
pose_image = pose(image, hand_and_face=True)
```

## Control Weights

Adjust influence per ControlNet:

```python

# Full control
output = pipe(..., controlnet_conditioning_scale=1.0)

# Partial control (more creative freedom)
output = pipe(..., controlnet_conditioning_scale=0.5)

# Very light guidance
output = pipe(..., controlnet_conditioning_scale=0.3)
```

### Per-Step Control

```python

# Control only during certain steps
output = pipe(
    prompt="...",
    image=control_image,
    controlnet_conditioning_scale=1.0,
    control_guidance_start=0.0,  # Start at beginning
    control_guidance_end=0.5,    # Stop at 50% of steps
    num_inference_steps=30
).images[0]
```

## Inpaint with ControlNet

```python
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a red sports car",
    image=init_image,
    mask_image=mask,
    control_image=canny_image,
    num_inference_steps=30
).images[0]
```

## Batch Processing

```python
import os
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

canny = CannyDetector()

input_dir = "./inputs"
output_dir = "./outputs"
os.makedirs(output_dir, exist_ok=True)

prompt = "beautiful landscape painting, detailed, artistic"

for filename in os.listdir(input_dir):
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
        image = Image.open(os.path.join(input_dir, filename))
        control_image = canny(image)

        output = pipe(
            prompt=prompt,
            image=control_image,
            num_inference_steps=30
        ).images[0]

        output.save(os.path.join(output_dir, f"cn_{filename}"))
```

## Control Type Guide

| Control  | Best For               | Strength |
| -------- | ---------------------- | -------- |
| Canny    | Architecture, objects  | 0.8-1.0  |
| Depth    | 3D scenes, perspective | 0.6-0.8  |
| Pose     | People, characters     | 0.8-1.0  |
| Scribble | Sketches, concepts     | 0.6-0.8  |
| Line Art | Illustrations          | 0.7-0.9  |
| Softedge | General guidance       | 0.5-0.7  |
| Seg      | Scene composition      | 0.6-0.8  |

## Performance

| Setup           | GPU      | Resolution | Time |
| --------------- | -------- | ---------- | ---- |
| Single CN SD1.5 | RTX 3090 | 512x512    | \~3s |
| Multi CN SD1.5  | RTX 3090 | 512x512    | \~5s |
| Single CN SDXL  | RTX 4090 | 1024x1024  | \~8s |

## Memory Optimization

```python

# Enable memory-efficient attention
pipe.enable_xformers_memory_efficient_attention()

# CPU offload
pipe.enable_model_cpu_offload()

# Attention slicing
pipe.enable_attention_slicing()
```

## Troubleshooting

### Weak Control Effect

* Increase `controlnet_conditioning_scale`
* Check preprocessor output quality
* Use higher resolution control image

### Artifacts

* Lower control scale
* Use softer preprocessor (softedge vs canny)
* Add negative prompt for artifacts

### VRAM Issues

* Use CPU offload
* Reduce resolution
* Use one ControlNet at a time

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* Stable Diffusion WebUI
* ComfyUI Workflows
* [Kohya Training](https://docs.clore.ai/guides/training/kohya-training)
