# ControlNet

Domina ControlNet para un control preciso sobre la generación de imágenes por IA.

{% hint style="success" %}
Todos los ejemplos se pueden ejecutar en servidores GPU alquilados a través de [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Alquilar en CLORE.AI

1. Visita [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filtrar por tipo de GPU, VRAM y precio
3. Elegir **Bajo demanda** (tarifa fija) o **Spot** (precio de puja)
4. Configura tu pedido:
   * Selecciona imagen Docker
   * Establece puertos (TCP para SSH, HTTP para interfaces web)
   * Agrega variables de entorno si es necesario
   * Introduce el comando de inicio
5. Selecciona pago: **CLORE**, **BTC**, o **USDT/USDC**
6. Crea el pedido y espera el despliegue

### Accede a tu servidor

* Encuentra los detalles de conexión en **Mis Pedidos**
* Interfaces web: Usa la URL del puerto HTTP
* SSH: `ssh -p <port> root@<proxy-address>`

## ¿Qué es ControlNet?

ControlNet añade control condicional a Stable Diffusion:

* **Canny** - Detección de bordes
* **Profundidad** - Mapas de profundidad 3D
* **Pose** - Posturas humanas
* **Dibujo** - Bocetos toscos
* **Segmentación** - Máscaras semánticas
* **Arte Lineal** - Líneas limpias
* **IP-Adapter** - Transferencia de estilo

## Requisitos

| Tipo de Control  | VRAM mínima | Recomendado |
| ---------------- | ----------- | ----------- |
| ControlNet único | 8GB         | RTX 3070    |
| Multi ControlNet | 12GB        | RTX 3090    |
| ControlNet SDXL  | 16GB        | RTX 4090    |

## Despliegue rápido con A1111

**Comando:**

```bash
cd /workspace/stable-diffusion-webui && \
cd extensions && \
git clone https://github.com/Mikubill/sd-webui-controlnet && \
cd .. && \
python launch.py --listen --enable-insecure-extension-access
```

### Descargar modelos

```bash
cd /workspace/stable-diffusion-webui/extensions/sd-webui-controlnet/models

# ControlNets SD 1.5
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_canny.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11f1p_sd15_depth.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_openpose.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_scribble.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_lineart.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_softedge.pth
wget https://huggingface.co/lllyasviel/ControlNet-v1-1/resolve/main/control_v11p_sd15_seg.pth

# ControlNets SDXL
wget https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0/resolve/main/diffusion_pytorch_model.safetensors -O controlnet-canny-sdxl.safetensors
```

## Python con Diffusers

### Control de bordes Canny

```python
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
import cv2
import numpy as np

# Cargar ControlNet
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

# Cargar pipeline
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
)
pipe.to("cuda")
pipe.enable_model_cpu_offload()

# Preparar imagen de control
image = load_image("input.jpg")
canny = CannyDetector()
control_image = canny(image)

# Generar
output = pipe(
    prompt="una mujer hermosa en un jardín, alta calidad",
    negative_prompt="feo, borroso",
    image=control_image,
    num_inference_steps=30,
    controlnet_conditioning_scale=1.0
).images[0]

output.save("canny_output.png")
```

### Control de profundidad

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import MidasDetector
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1p_sd15_depth",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

# Obtener mapa de profundidad
depth_estimator = MidasDetector.from_pretrained("lllyasviel/Annotators")
depth_image = depth_estimator(image)

# Generar con profundidad
output = pipe(
    prompt="una ciudad futurista, ciencia ficción, detallada",
    image=depth_image,
    num_inference_steps=30
).images[0]
```

### OpenPose (Posturas humanas)

```python
from controlnet_aux import OpenposeDetector

# Obtener pose
pose_detector = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
pose_image = pose_detector(image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_openpose",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="una bailarina bailando, elegante, iluminación de estudio",
    image=pose_image,
    num_inference_steps=30
).images[0]
```

### Dibujo/Boceto

```python
from controlnet_aux import HEDdetector

# Detectar bordes como dibujo
hed = HEDdetector.from_pretrained("lllyasviel/Annotators")
scribble_image = hed(image, scribble=True)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_scribble",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="una pintura detallada de un paisaje",
    image=scribble_image,
    num_inference_steps=30
).images[0]
```

## Multi-ControlNet

Combina múltiples controles:

```python
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

# Cargar múltiples ControlNets
controlnet_canny = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

controlnet_depth = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11f1p_sd15_depth",
    torch_dtype=torch.float16
)

# Crear pipeline con múltiples ControlNets
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=[controlnet_canny, controlnet_depth],
    torch_dtype=torch.float16
).to("cuda")

# Generar con múltiples controles
output = pipe(
    prompt="un retrato hermoso",
    image=[canny_image, depth_image],
    controlnet_conditioning_scale=[1.0, 0.8],  # Ajustar pesos
    num_inference_steps=30
).images[0]
```

## ControlNet SDXL

```python
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
import torch

# Cargar ControlNet SDXL
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

# Preparar imagen canny
canny = CannyDetector()
control_image = canny(image, low_threshold=100, high_threshold=200)

output = pipe(
    prompt="una fotografía profesional, detallada, 8k",
    image=control_image,
    controlnet_conditioning_scale=0.5,
    num_inference_steps=30
).images[0]
```

## IP-Adapter (Transferencia de estilo)

```python
from diffusers import StableDiffusionPipeline
from transformers import CLIPVisionModelWithProjection
import torch

# Cargar IP-Adapter
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

pipe.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="models",
    weight_name="ip-adapter_sd15.bin"
)

pipe.set_ip_adapter_scale(0.6)

# Imagen de referencia de estilo
style_image = load_image("style_reference.jpg")

output = pipe(
    prompt="un gato sentado en una silla",
    ip_adapter_image=style_image,
    num_inference_steps=30
).images[0]
```

## Preprocesadores

Todos los preprocesadores disponibles:

```python
from controlnet_aux import (
    CannyDetector,           # Detección de bordes
    HEDdetector,             # Borde suave/dibujo
    MidasDetector,           # Estimación de profundidad
    OpenposeDetector,        # Pose humana
    MLSDdetector,            # Detección de líneas
    LineartDetector,         # Arte lineal
    LineartAnimeDetector,    # Arte lineal anime
    NormalBaeDetector,       # Mapas normales
    ContentShuffleDetector,  # Reordenar contenido
    ZoeDetector,             # Mejor profundidad
    MediapipeFaceDetector,   # Malla facial
)

# Ejemplo de uso
canny = CannyDetector()
canny_image = canny(image, low_threshold=100, high_threshold=200)

depth = MidasDetector.from_pretrained("lllyasviel/Annotators")
depth_image = depth(image)

pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
pose_image = pose(image, hand_and_face=True)
```

## Pesos de control

Ajusta la influencia por ControlNet:

```python

# Control total
output = pipe(..., controlnet_conditioning_scale=1.0)

# Control parcial (más libertad creativa)
output = pipe(..., controlnet_conditioning_scale=0.5)

# Guía muy ligera
output = pipe(..., controlnet_conditioning_scale=0.3)
```

### Control por paso

```python

# Controlar solo durante ciertos pasos
output = pipe(
    prompt="...",
    image=control_image,
    controlnet_conditioning_scale=1.0,
    control_guidance_start=0.0,  # Comenzar al inicio
    control_guidance_end=0.5,    # Detener al 50% de los pasos
    num_inference_steps=30
).images[0]
```

## Retoque (Inpaint) con ControlNet

```python
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

output = pipe(
    prompt="un coche deportivo rojo",
    image=init_image,
    mask_image=mask,
    control_image=canny_image,
    num_inference_steps=30
).images[0]
```

## Procesamiento por lotes

```python
import os
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector
from PIL import Image
import torch

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")

canny = CannyDetector()

input_dir = "./inputs"
output_dir = "./outputs"
os.makedirs(output_dir, exist_ok=True)

prompt = "hermosa pintura de paisaje, detallada, artística"

for filename in os.listdir(input_dir):
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
        image = Image.open(os.path.join(input_dir, filename))
        control_image = canny(image)

        output = pipe(
            prompt=prompt,
            image=control_image,
            num_inference_steps=30
        ).images[0]

        output.save(os.path.join(output_dir, f"cn_{filename}"))
```

## Guía de tipos de control

| Control     | Mejor para               | Fortaleza |
| ----------- | ------------------------ | --------- |
| Canny       | Arquitectura, objetos    | 0.8-1.0   |
| Profundidad | Escenas 3D, perspectiva  | 0.6-0.8   |
| Pose        | Personas, personajes     | 0.8-1.0   |
| Dibujo      | Bocetos, conceptos       | 0.6-0.8   |
| Arte Lineal | Ilustraciones            | 0.7-0.9   |
| Softedge    | Guía general             | 0.5-0.7   |
| Seg         | Composición de la escena | 0.6-0.8   |

## Rendimiento

| Configuración  | GPU      | Resolución | Tiempo |
| -------------- | -------- | ---------- | ------ |
| CN único SD1.5 | RTX 3090 | 512x512    | \~3s   |
| Multi CN SD1.5 | RTX 3090 | 512x512    | \~5s   |
| CN único SDXL  | RTX 4090 | 1024x1024  | \~8s   |

## Optimización de memoria

```python

# Habilitar atención eficiente en memoria
pipe.enable_xformers_memory_efficient_attention()

# Descarga de CPU
pipe.enable_model_cpu_offload()

# Segmentación de atención
pipe.enable_attention_slicing()
```

## Solución de problemas

### Efecto de control débil

* Aumente `controlnet_conditioning_scale`
* Comprueba la calidad de la salida del preprocesador
* Usa imagen de control de mayor resolución

### Artefactos

* Reduce la escala de control
* Usa un preprocesador más suave (softedge vs canny)
* Añade prompt negativo para artefactos

### Problemas de VRAM

* Usa descarga a CPU
* Reducir resolución
* Usa un ControlNet a la vez

## Estimación de costos

Tarifas típicas del marketplace de CLORE.AI (a fecha de 2024):

| GPU       | Tarifa por hora | Tarifa diaria | Sesión de 4 horas |
| --------- | --------------- | ------------- | ----------------- |
| RTX 3060  | \~$0.03         | \~$0.70       | \~$0.12           |
| RTX 3090  | \~$0.06         | \~$1.50       | \~$0.25           |
| RTX 4090  | \~$0.10         | \~$2.30       | \~$0.40           |
| A100 40GB | \~$0.17         | \~$4.00       | \~$0.70           |
| A100 80GB | \~$0.25         | \~$6.00       | \~$1.00           |

*Los precios varían según el proveedor y la demanda. Consulta* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *para las tarifas actuales.*

**Ahorra dinero:**

* Usa **Spot** market para cargas de trabajo flexibles (a menudo 30-50% más barato)
* Paga con **CLORE** tokens
* Compara precios entre diferentes proveedores

## Próximos pasos

* Stable Diffusion WebUI
* Flujos de trabajo ComfyUI
* [Entrenamiento Kohya](/guides/guides_v2-es/entrenamiento/kohya-training.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-es/procesamiento-de-imagenes/controlnet-advanced.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.