# ControlNet

掌握 ControlNet，以便对 AI 图像生成进行精确控制。

{% hint style="success" %}
所有示例都可以在通过以下方式租用的 GPU 服务器上运行： [CLORE.AI 市场](https://clore.ai/marketplace).
{% endhint %}

## 在 CLORE.AI 上租用

1. 访问 [CLORE.AI 市场](https://clore.ai/marketplace)
2. 按 GPU 类型、显存和价格筛选
3. 选择 **按需** （固定费率）或 **竞价** （出价价格）
4. 配置您的订单：
   * 选择 Docker 镜像
   * 设置端口（用于 SSH 的 TCP，Web 界面的 HTTP）
   * 如有需要，添加环境变量
   * 输入启动命令
5. 选择支付方式： **CLORE**, **BTC**，或 **USDT/USDC**
6. 创建订单并等待部署

### 访问您的服务器

* 在以下位置查找连接详情： **我的订单**
* Web 界面：使用 HTTP 端口的 URL
* SSH： `ssh -p <port> root@<proxy-address>`

## 什么是 ControlNet？

ControlNet 为 Stable Diffusion 添加条件控制：

* **Canny（边缘）** - 边缘检测
* **深度** - 3D 深度图
* **姿势** - 人体姿态
* **涂鸦** - 粗略草图
* **分割** - 语义掩码
* **线稿** - 干净的线条
* **IP-Adapter** - 风格迁移

## 要求

| 控制类型            | 最小显存 | 推荐       |
| --------------- | ---- | -------- |
| 单一 ControlNet   | 8GB  | RTX 3070 |
| 多 ControlNet    | 12GB | 速度       |
| SDXL ControlNet | 16GB | 512x512  |

## 使用 A1111 快速部署

**命令：**

```bash
cd /workspace/stable-diffusion-webui && \
cd extensions && \
git clone https://github.com/Mikubill/sd-webui-controlnet && \
cd .. && \
python launch.py --listen --enable-insecure-extension-access
```

### 下载模型

```bash
cd /workspace/stable-diffusion-webui/extensions/sd-webui-controlnet/models

# SD 1.5 ControlNets
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.pth

# SDXL ControlNets
wget https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0/resolve/main/diffusion_pytorch_model.safetensors -O controlnet-canny-sdxl.safetensors
```

## 使用 Diffusers 的 Python

### Canny 边缘控制

```python
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
import cv2
import numpy as np

# 加载 ControlNet
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

# 加载管线
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# 准备控制图像
image = load_image("input.jpg")
canny = CannyDetector()
control_image = canny(image)

# 生成
output = pipe(
    prompt="a beautiful woman in a garden, high quality",
    negative_prompt="ugly, blurry",
    image=control_image,
    增加推理步数以提高稳定性
    controlnet_conditioning_scale=1.0
).images[0]

output.save("canny_output.png")
```

### 深度控制

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import MidasDetector
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1p_sd15_depth",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

# 获取深度图
depth_estimator = MidasDetector.from_pretrained("lllyasviel/Annotators")
depth_image = depth_estimator(image)

# 使用深度生成
output = pipe(
    prompt="a futuristic city, sci-fi, detailed",
    image=depth_image,
    num_inference_steps=30
).images[0]
```

### OpenPose（人体姿态）

```python
from controlnet_aux import OpenposeDetector

# 获取姿态
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
pose_image = pose_detector(image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_openpose",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a ballerina dancing, elegant, studio lighting",
    image=pose_image,
    num_inference_steps=30
).images[0]
```

### 涂鸦/素描

```python
from controlnet_aux import HEDdetector

# 将边缘检测为涂鸦
hed = HEDdetector.from_pretrained("lllyasviel/Annotators")
scribble_image = hed(image, scribble=True)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_scribble",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a detailed painting of a landscape",
    image=scribble_image,
    num_inference_steps=30
).images[0]
```

## 多 ControlNet

组合多个控制：

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

# 加载多个 ControlNet
controlnet_canny = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

controlnet_depth = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1p_sd15_depth",
    torch_dtype=torch.float16
)

# 使用多个 ControlNet 创建管道
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=[controlnet_canny, controlnet_depth],
    torch_dtype=torch.float16
).to("cuda")

# 使用多重控制生成
output = pipe(
    prompt="a beautiful portrait",
    image=[canny_image, depth_image],
    controlnet_conditioning_scale=[1.0, 0.8],  # 调整权重
    num_inference_steps=30
).images[0]
```

## SDXL ControlNet

```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
import torch

# 加载 SDXL ControlNet
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

# 准备 canny 图像
canny = CannyDetector()
control_image = canny(image, low_threshold=100, high_threshold=200)

output = pipe(
    prompt="a professional photograph, detailed, 8k",
    image=control_image,
    controlnet_conditioning_scale=0.5,
    num_inference_steps=30
).images[0]
```

## IP-Adapter（风格迁移）

```python
from diffusers import StableDiffusionPipeline
from transformers import CLIPVisionModelWithProjection
import torch

# 加载 IP-Adapter
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

pipe.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="models",
    weight_name="ip-adapter_sd15.bin"
)

pipe.set_ip_adapter_scale(0.6)

# 风格参考图像
style_image = load_image("style_reference.jpg")

output = pipe(
    prompt="a cat sitting on a chair",
    ip_adapter_image=style_image,
    num_inference_steps=30
).images[0]
```

## 预处理器

所有可用的预处理器：

```python
from controlnet_aux import (
    CannyDetector,           # 边缘检测
    HEDdetector,             # 软边/涂鸦
    MidasDetector,           # 深度估计
    OpenposeDetector,        # 人体姿态
    MLSDdetector,            # 线检测
    LineartDetector,         # 线稿
    LineartAnimeDetector,    # 动漫线稿
    NormalBaeDetector,       # 法线贴图
    ContentShuffleDetector,  # 内容重排
    ZoeDetector,             # 更好的深度
    MediapipeFaceDetector,   # 面部网格
)

# 示例用法
canny = CannyDetector()
canny_image = canny(image, low_threshold=100, high_threshold=200)

depth = MidasDetector.from_pretrained("lllyasviel/Annotators")
depth_image = depth(image)

pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
pose_image = pose(image, hand_and_face=True)
```

## 控制权重

调整每个 ControlNet 的影响：

```python

# 完全控制
output = pipe(..., controlnet_conditioning_scale=1.0)

# 部分控制（更多创作自由）
output = pipe(..., controlnet_conditioning_scale=0.5)

# 非常轻微的引导
output = pipe(..., controlnet_conditioning_scale=0.3)
```

### 逐步控制

```python

# 仅在某些步骤期间控制
output = pipe(
    prompt="...",
    image=control_image,
    controlnet_conditioning_scale=1.0,
    control_guidance_start=0.0,  # 从开始时启用
    control_guidance_end=0.5,    # 在步骤的 50% 时停止
    num_inference_steps=30
).images[0]
```

## 使用 ControlNet 进行修补（Inpaint）

```python
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="a red sports car",
    image=init_image,
    mask_image=mask,
    control_image=canny_image,
    num_inference_steps=30
).images[0]
```

## "专业影棚柔光箱"

```python
批处理处理
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

canny = CannyDetector()

input_dir = "./inputs"
output_dir = "./outputs"
output_dir = "./relit"

prompt = "beautiful landscape painting, detailed, artistic"

lighting_prompt = "专业影棚照明，柔和的阴影"
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
        continue
        control_image = canny(image)

        output = pipe(
            os.makedirs("./variations", exist_ok=True)
            image=control_image,
            num_inference_steps=30
        ).images[0]

        output.save(os.path.join(output_dir, f"cn_{filename}"))
```

## 控制类型指南

| 控制           | 最适合      | 优势      |
| ------------ | -------- | ------- |
| Canny（边缘）    | 建筑、物体    | 0.8-1.0 |
| 深度           | 3D 场景、透视 | 0.6-0.8 |
| 姿势           | 人物、角色    | 0.8-1.0 |
| 涂鸦           | 草图、概念    | 0.6-0.8 |
| 线稿           | 插画       | 0.7-0.9 |
| 软边（Softedge） | 一般引导     | 0.5-0.7 |
| 分割（Seg）      | 场景构图     | 0.6-0.8 |

## background = Image.open("studio\_bg.jpg")

| 设置          | GPU     | 分辨率      | 时间   |
| ----------- | ------- | -------- | ---- |
| 单一 CN SD1.5 | 速度      | 分辨率      | \~3s |
| 多 CN SD1.5  | 速度      | 分辨率      | \~5s |
| 单一 CN SDXL  | 512x512 | RTX 4090 | \~8s |

## 内存优化

```python

# 启用内存高效注意力
pipe.enable_xformers_memory_efficient_attention()

# CPU 卸载
pipe.enable_model_cpu_offload()

# 注意力切片
pipe.enable_attention_slicing()
```

## # 使用固定种子以获得一致结果

### 控制效果弱

* 增加 `controlnet_conditioning_scale`
* 检查预处理器输出质量
* 使用更高分辨率的控制图像

### 伪影

* 降低控制强度
* 使用更柔和的预处理器（softedge 而不是 canny）
* 为伪影添加负面提示

### 显存问题

* 使用 CPU 卸载
* 降低分辨率
* 一次仅使用一个 ControlNet

## 下载所有所需的检查点

检查文件完整性

| GPU     | 验证 CUDA 兼容性 | 费用估算    | CLORE.AI 市场的典型费率（截至 2024 年）： |
| ------- | ----------- | ------- | ---------------------------- |
| 按小时费率   | \~$0.03     | \~$0.70 | \~$0.12                      |
| 速度      | \~$0.06     | \~$1.50 | \~$0.25                      |
| 512x512 | \~$0.10     | \~$2.30 | \~$0.40                      |
| 按日费率    | \~$0.17     | \~$4.00 | \~$0.70                      |
| 4 小时会话  | \~$0.25     | \~$6.00 | \~$1.00                      |

*RTX 3060* [*CLORE.AI 市场*](https://clore.ai/marketplace) *A100 40GB*

**A100 80GB**

* 使用 **竞价** 价格随提供商和需求而异。请查看
* 以获取当前费率。 **CLORE** 节省费用：
* 市场用于灵活工作负载（通常便宜 30-50%）

## 使用以下方式支付

* Stable Diffusion WebUI
* ComfyUI 工作流程
* [Kohya 训练](/guides/guides_v2-zh/xun-lian/kohya-training.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-zh/tu-xiang-chu-li/controlnet-advanced.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.