# ControlNet

Master ControlNet for precise control over AI image generation.

{% hint style="success" %}
All examples can be run on GPU servers rented through [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Renting on CLORE.AI

1. Visit [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filter by GPU type, VRAM, and price
3. Choose **On-Demand** (fixed rate) or **Spot** (bid price)
4. Configure your order:
   * Select Docker image
   * Set ports (TCP for SSH, HTTP for web UIs)
   * Add environment variables if needed
   * Enter startup command
5. Select payment: **CLORE**, **BTC**, or **USDT/USDC**
6. Create order and wait for deployment

### Access Your Server

* Find connection details in **My Orders**
* Web interfaces: Use the HTTP port URL
* SSH: `ssh -p <port> root@<proxy-address>`

## What is ControlNet?

ControlNet adds conditional control to Stable Diffusion:

* **Canny** - Edge detection
* **Depth** - 3D depth maps
* **Pose** - Human poses
* **Scribble** - Rough sketches
* **Segmentation** - Semantic masks
* **Line Art** - Clean lines
* **IP-Adapter** - Style transfer

## Requirements

| Control Type      | Min VRAM | Recommended |
| ----------------- | -------- | ----------- |
| Single ControlNet | 8GB      | RTX 3070    |
| Multi ControlNet  | 12GB     | RTX 3090    |
| SDXL ControlNet   | 16GB     | RTX 4090    |

## Quick Deploy with A1111

**Command:**

```bash
cd /workspace/stable-diffusion-webui && \
cd extensions && \
git clone https://github.com/Mikubill/sd-webui-controlnet && \
cd .. && \
python launch.py --listen --enable-insecure-extension-access
```

### Download Models

```bash
cd /workspace/stable-diffusion-webui/extensions/sd-webui-controlnet/models

# SD 1.5 ControlNets
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.pth

# SDXL ControlNets
wget https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0/resolve/main/diffusion_pytorch_model.safetensors -O controlnet-canny-sdxl.safetensors
```

## Python with Diffusers

### Canny Edge Control

```python
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
import cv2
import numpy as np

# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

# Load pipeline
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# Prepare control image
image = load_image("input.jpg")
canny = CannyDetector()
control_image = canny(image)

# Generate
output = pipe(
    prompt="a beautiful woman in a garden, high quality",
    negative_prompt="ugly, blurry",
    image=control_image,
    num_inference_steps=30,
    controlnet_conditioning_scale=1.0
).images[0]

output.save("canny_output.png")
```

### Depth Control

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import MidasDetector
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1p_sd15_depth",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

# Get depth map
depth_estimator = MidasDetector.from_pretrained("lllyasviel/Annotators")
depth_image = depth_estimator(image)

# Generate with depth
output = pipe(
    prompt="a futuristic city, sci-fi, detailed",
    image=depth_image,
    num_inference_steps=30
).images[0]
```

### OpenPose (Human Poses)

```python
from controlnet_aux import OpenposeDetector

# Get pose
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
pose_image = pose_detector(image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_openpose",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a ballerina dancing, elegant, studio lighting",
    image=pose_image,
    num_inference_steps=30
).images[0]
```

### Scribble/Sketch

```python
from controlnet_aux import HEDdetector

# Detect edges as scribble
hed = HEDdetector.from_pretrained("lllyasviel/Annotators")
scribble_image = hed(image, scribble=True)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_scribble",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a detailed painting of a landscape",
    image=scribble_image,
    num_inference_steps=30
).images[0]
```

## Multi-ControlNet

Combine multiple controls:

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

# Load multiple ControlNets
controlnet_canny = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

controlnet_depth = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1p_sd15_depth",
    torch_dtype=torch.float16
)

# Create pipeline with multiple ControlNets
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=[controlnet_canny, controlnet_depth],
    torch_dtype=torch.float16
).to("cuda")

# Generate with multiple controls
output = pipe(
    prompt="a beautiful portrait",
    image=[canny_image, depth_image],
    controlnet_conditioning_scale=[1.0, 0.8],  # Adjust weights
    num_inference_steps=30
).images[0]
```

## SDXL ControlNet

```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
import torch

# Load SDXL ControlNet
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

# Prepare canny image
canny = CannyDetector()
control_image = canny(image, low_threshold=100, high_threshold=200)

output = pipe(
    prompt="a professional photograph, detailed, 8k",
    image=control_image,
    controlnet_conditioning_scale=0.5,
    num_inference_steps=30
).images[0]
```

## IP-Adapter (Style Transfer)

```python
from diffusers import StableDiffusionPipeline
from transformers import CLIPVisionModelWithProjection
import torch

# Load IP-Adapter
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

pipe.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="models",
    weight_name="ip-adapter_sd15.bin"
)

pipe.set_ip_adapter_scale(0.6)

# Style reference image
style_image = load_image("style_reference.jpg")

output = pipe(
    prompt="a cat sitting on a chair",
    ip_adapter_image=style_image,
    num_inference_steps=30
).images[0]
```

## Preprocessors

All available preprocessors:

```python
from controlnet_aux import (
    CannyDetector,           # Edge detection
    HEDdetector,             # Soft edge/scribble
    MidasDetector,           # Depth estimation
    OpenposeDetector,        # Human pose
    MLSDdetector,            # Line detection
    LineartDetector,         # Line art
    LineartAnimeDetector,    # Anime line art
    NormalBaeDetector,       # Normal maps
    ContentShuffleDetector,  # Shuffle content
    ZoeDetector,             # Better depth
    MediapipeFaceDetector,   # Face mesh
)

# Example usage
canny = CannyDetector()
canny_image = canny(image, low_threshold=100, high_threshold=200)

depth = MidasDetector.from_pretrained("lllyasviel/Annotators")
depth_image = depth(image)

pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
pose_image = pose(image, hand_and_face=True)
```

## Control Weights

Adjust influence per ControlNet:

```python

# Full control
output = pipe(..., controlnet_conditioning_scale=1.0)

# Partial control (more creative freedom)
output = pipe(..., controlnet_conditioning_scale=0.5)

# Very light guidance
output = pipe(..., controlnet_conditioning_scale=0.3)
```

### Per-Step Control

```python

# Control only during certain steps
output = pipe(
    prompt="...",
    image=control_image,
    controlnet_conditioning_scale=1.0,
    control_guidance_start=0.0,  # Start at beginning
    control_guidance_end=0.5,    # Stop at 50% of steps
    num_inference_steps=30
).images[0]
```

## Inpaint with ControlNet

```python
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a red sports car",
    image=init_image,
    mask_image=mask,
    control_image=canny_image,
    num_inference_steps=30
).images[0]
```

## Batch Processing

```python
import os
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

canny = CannyDetector()

input_dir = "./inputs"
output_dir = "./outputs"
os.makedirs(output_dir, exist_ok=True)

prompt = "beautiful landscape painting, detailed, artistic"

for filename in os.listdir(input_dir):
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
        image = Image.open(os.path.join(input_dir, filename))
        control_image = canny(image)

        output = pipe(
            prompt=prompt,
            image=control_image,
            num_inference_steps=30
        ).images[0]

        output.save(os.path.join(output_dir, f"cn_{filename}"))
```

## Control Type Guide

| Control  | Best For               | Strength |
| -------- | ---------------------- | -------- |
| Canny    | Architecture, objects  | 0.8-1.0  |
| Depth    | 3D scenes, perspective | 0.6-0.8  |
| Pose     | People, characters     | 0.8-1.0  |
| Scribble | Sketches, concepts     | 0.6-0.8  |
| Line Art | Illustrations          | 0.7-0.9  |
| Softedge | General guidance       | 0.5-0.7  |
| Seg      | Scene composition      | 0.6-0.8  |

## Performance

| Setup           | GPU      | Resolution | Time |
| --------------- | -------- | ---------- | ---- |
| Single CN SD1.5 | RTX 3090 | 512x512    | \~3s |
| Multi CN SD1.5  | RTX 3090 | 512x512    | \~5s |
| Single CN SDXL  | RTX 4090 | 1024x1024  | \~8s |

## Memory Optimization

```python

# Enable memory-efficient attention
pipe.enable_xformers_memory_efficient_attention()

# CPU offload
pipe.enable_model_cpu_offload()

# Attention slicing
pipe.enable_attention_slicing()
```

## Troubleshooting

### Weak Control Effect

* Increase `controlnet_conditioning_scale`
* Check preprocessor output quality
* Use higher resolution control image

### Artifacts

* Lower control scale
* Use softer preprocessor (softedge vs canny)
* Add negative prompt for artifacts

### VRAM Issues

* Use CPU offload
* Reduce resolution
* Use one ControlNet at a time

## Cost Estimate

Typical CLORE.AI marketplace rates (as of 2024):

| GPU       | Hourly Rate | Daily Rate | 4-Hour Session |
| --------- | ----------- | ---------- | -------------- |
| RTX 3060  | \~$0.03     | \~$0.70    | \~$0.12        |
| RTX 3090  | \~$0.06     | \~$1.50    | \~$0.25        |
| RTX 4090  | \~$0.10     | \~$2.30    | \~$0.40        |
| A100 40GB | \~$0.17     | \~$4.00    | \~$0.70        |
| A100 80GB | \~$0.25     | \~$6.00    | \~$1.00        |

*Prices vary by provider and demand. Check* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *for current rates.*

**Save money:**

* Use **Spot** market for flexible workloads (often 30-50% cheaper)
* Pay with **CLORE** tokens
* Compare prices across different providers

## Next Steps

* Stable Diffusion WebUI
* ComfyUI Workflows
* [Kohya Training](https://docs.clore.ai/guides/training/kohya-training)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/image-processing/controlnet-advanced.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
