# Segment Anything

Usa SAM de Meta para segmentación de imagenes precisa en GPU.

{% hint style="success" %}
Todos los ejemplos se pueden ejecutar en servidores GPU alquilados a través de [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Alquilar en CLORE.AI

1. Visita [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filtrar por tipo de GPU, VRAM y precio
3. Elegir **Bajo demanda** (tarifa fija) o **Spot** (precio de puja)
4. Configura tu pedido:
   * Selecciona imagen Docker
   * Establece puertos (TCP para SSH, HTTP para interfaces web)
   * Agrega variables de entorno si es necesario
   * Introduce el comando de inicio
5. Selecciona pago: **CLORE**, **BTC**, o **USDT/USDC**
6. Crea el pedido y espera el despliegue

### Accede a tu servidor

* Encuentra los detalles de conexión en **Mis Pedidos**
* Interfaces web: Usa la URL del puerto HTTP
* SSH: `ssh -p <port> root@<proxy-address>`

## ¿Qué es SAM?

El Segment Anything Model (SAM) puede:

* Segmentar cualquier objeto en imágenes
* Trabajar con indicaciones (puntos, cajas, texto)
* Generar máscaras automáticas
* Manejar cualquier tipo de imagen

## Variantes de modelo

| Modelo         | VRAM | Calidad | Velocidad |
| -------------- | ---- | ------- | --------- |
| SAM-H (enorme) | 8GB  | Mejor   | Lento     |
| SAM-L (grande) | 6GB  | Genial  | Medio     |
| SAM-B (base)   | 4GB  | Bueno   | Rápido    |
| SAM2           | 8GB+ | Mejor   | Medio     |

## Despliegue rápido

**Imagen Docker:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Puertos:**

```
22/tcp
7860/http
```

**Comando:**

```bash
pip install segment-anything gradio opencv-python && \
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth && \
python -c "
import gradio as gr
import numpy as np
from segment_anything import sam_model_registry, SamPredictor
import cv2

sam = sam_model_registry['vit_h'](checkpoint='sam_vit_h_4b8939.pth').cuda()
predictor = SamPredictor(sam)

def segment(image, evt: gr.SelectData):
    predictor.set_image(image)
    point = np.array([[evt.index[0], evt.index[1]]])
    masks, _, _ = predictor.predict(point_coords=point, point_labels=np.array([1]))
    mask = masks[0]
    colored = np.zeros_like(image)
    colored[mask] = [255, 0, 0]
    result = cv2.addWeighted(image, 0.7, colored, 0.3, 0)
    return result

demo = gr.Interface(fn=segment, inputs=gr.Image(), outputs=gr.Image(), title='Haz clic para segmentar')
demo.launch(server_name='0.0.0.0', server_port=7860)
"
```

## Accediendo a tu servicio

Después del despliegue, encuentra tu `http_pub` URL en **Mis Pedidos**:

1. Ir a **Mis Pedidos** página
2. Haz clic en tu pedido
3. Encuentra la `http_pub` URL (por ejemplo, `abc123.clorecloud.net`)

Usa `https://TU_HTTP_PUB_URL` en lugar de `localhost` en los ejemplos abajo.

## Instalación

```bash
pip install segment-anything opencv-python
```

### Descargar modelos

```bash

# SAM-H (mejor calidad)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

# SAM-L (equilibrado)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth

# SAM-B (rápido)
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
```

## API de Python

### Segmentación básica con puntos

```python
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np

# Cargar modelo
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

predictor = SamPredictor(sam)

# Cargar imagen
image = cv2.imread("photo.jpg")
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Establecer imagen
predictor.set_image(image_rgb)

# Segmentar con indicación por punto
input_point = np.array([[500, 375]])  # coordenadas x, y
input_label = np.array([1])  # 1 = primer plano, 0 = fondo

masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True
)

# Obtener la mejor máscara
best_mask = masks[np.argmax(scores)]

# Guardar máscara
cv2.imwrite("mask.png", best_mask.astype(np.uint8) * 255)
```

### Indicación con caja

```python

# Segmentar con cuadro delimitador
input_box = np.array([100, 100, 400, 400])  # x1, y1, x2, y2

masks, scores, _ = predictor.predict(
    box=input_box,
    multimask_output=False
)
```

### Múltiples puntos

```python

# Múltiples puntos de primer plano/fondo
input_points = np.array([
    [500, 375],   # Punto 1
    [550, 400],   # Punto 2
    [100, 100],   # Punto de fondo
])
input_labels = np.array([1, 1, 0])  # 1=primer plano, 0=fondo

masks, scores, _ = predictor.predict(
    point_coords=input_points,
    point_labels=input_labels,
    multimask_output=True
)
```

### Combinado Caja + Punto

```python
masks, scores, _ = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box=input_box,
    multimask_output=False
)
```

## Generación automática de máscaras

Generar todas las máscaras posibles:

```python
from segment_anything import SamAutomaticMaskGenerator
import cv2

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

mask_generator = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=32,
    pred_iou_thresh=0.86,
    stability_score_thresh=0.92,
    crop_n_layers=1,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=100
)

image = cv2.imread("photo.jpg")
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

masks = mask_generator.generate(image_rgb)

# Cada máscara contiene:

# - 'segmentation': máscara binaria

# - 'area': área de la máscara en píxeles

# - 'bbox': caja delimitadora

# - 'predicted_iou': puntuación de calidad

# - 'stability_score': puntuación de estabilidad

print(f"Found {len(masks)} masks")
```

### Visualizar todas las máscaras

```python
import matplotlib.pyplot as plt

def show_masks(image, masks):
    plt.figure(figsize=(20, 20))
    plt.imshow(image)

    sorted_masks = sorted(masks, key=lambda x: x['area'], reverse=True)

    for mask in sorted_masks:
        m = mask['segmentation']
        color = np.random.random(3)
        colored = np.zeros((*m.shape, 4))
        colored[m] = [*color, 0.5]
        plt.imshow(colored)

    plt.axis('off')
    plt.savefig('all_masks.png')

show_masks(image_rgb, masks)
```

## SAM 2 (última versión)

```bash
pip install sam2
```

```python
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large")

with torch.inference_mode():
    predictor.set_image(image)
    masks, scores, _ = predictor.predict(
        point_coords=points,
        point_labels=labels
    )
```

## Eliminar fondo

```python
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)

def remove_background(image_path, point):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([point]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    # Crear imagen RGBA
    result = cv2.cvtColor(image, cv2.COLOR_BGR2BGRA)
    result[:, :, 3] = best_mask.astype(np.uint8) * 255

    return result

# Haz clic en el objeto para conservarlo
result = remove_background("photo.jpg", [400, 300])
cv2.imwrite("no_background.png", result)
```

## Extraer objeto

```python
def extract_object(image_path, point):
    image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([point]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    # Obtener caja delimitadora
    rows = np.any(best_mask, axis=1)
    cols = np.any(best_mask, axis=0)
    y1, y2 = np.where(rows)[0][[0, -1]]
    x1, x2 = np.where(cols)[0][[0, -1]]

    # Recortar
    cropped = image[y1:y2+1, x1:x2+1]
    mask_cropped = best_mask[y1:y2+1, x1:x2+1]

    # Aplicar máscara
    result = cv2.cvtColor(cropped, cv2.COLOR_BGR2BGRA)
    result[:, :, 3] = mask_cropped.astype(np.uint8) * 255

    return result
```

## Procesamiento por lotes

```python
import os
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator
import cv2
import json

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")

mask_generator = SamAutomaticMaskGenerator(sam)

input_dir = "./images"
output_dir = "./segmented"
os.makedirs(output_dir, exist_ok=True)

for filename in os.listdir(input_dir):
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
        image = cv2.imread(os.path.join(input_dir, filename))
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        masks = mask_generator.generate(image_rgb)

        # Guardar máscaras como JSON
        mask_data = []
        for i, mask in enumerate(masks):
            mask_data.append({
                'id': i,
                'area': int(mask['area']),
                'bbox': mask['bbox'],
                'score': float(mask['predicted_iou'])
            })

            # Guardar máscara individual
            cv2.imwrite(
                os.path.join(output_dir, f"{filename}_mask_{i}.png"),
                mask['segmentation'].astype(np.uint8) * 255
            )

        with open(os.path.join(output_dir, f"{filename}_masks.json"), 'w') as f:
            json.dump(mask_data, f)
```

## Servidor API

```python
from fastapi import FastAPI, UploadFile
from fastapi.responses import Response
from segment_anything import sam_model_registry, SamPredictor
import cv2
import numpy as np
import json

app = FastAPI()

sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
sam.to("cuda")
predictor = SamPredictor(sam)

@app.post("/segment")
async def segment(file: UploadFile, x: int, y: int):
    contents = await file.read()
    nparr = np.frombuffer(contents, np.uint8)
    image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    predictor.set_image(image_rgb)

    masks, scores, _ = predictor.predict(
        point_coords=np.array([[x, y]]),
        point_labels=np.array([1]),
        multimask_output=True
    )

    best_mask = masks[np.argmax(scores)]

    _, encoded = cv2.imencode('.png', best_mask.astype(np.uint8) * 255)
    return Response(content=encoded.tobytes(), media_type="image/png")
```

## Integración con Stable Diffusion

Usa máscaras de SAM para inpainting:

```python

# Generar máscara con SAM
predictor.set_image(image)
masks, scores, _ = predictor.predict(point_coords=point, point_labels=label)
mask = masks[np.argmax(scores)]

# Usar en inpainting de SD
from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.to("cuda")

result = pipe(
    prompt="un coche deportivo rojo",
    image=image,
    mask_image=mask
).images[0]
```

## Rendimiento

| Modelo | Tamaño de imagen | GPU      | Tiempo |
| ------ | ---------------- | -------- | ------ |
| SAM-H  | 1024x1024        | RTX 3090 | \~0.5s |
| SAM-L  | 1024x1024        | RTX 3090 | \~0.3s |
| SAM-B  | 1024x1024        | RTX 3090 | \~0.2s |
| SAM2   | 1024x1024        | RTX 4090 | \~0.3s |

## Optimización de memoria

```python

# Para VRAM limitada
sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth")  # Usa modelo más pequeño

# O reduce los puntos de generación automática
mask_generator = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=16,  # Reducir desde 32
)
```

## Solución de problemas

### CUDA: fuera de memoria

* Usar SAM-B en lugar de SAM-H
* Reducir el tamaño de la imagen antes de procesar
* Limpiar caché: `torch.cuda.empty_cache()`

### Segmentación deficiente

* Añadir más puntos (primer plano + fondo)
* Usar indicación con caja para mejor guía
* Probar multimask\_output=True y escoger la mejor

## Estimación de costos

Tarifas típicas del marketplace de CLORE.AI (a fecha de 2024):

| GPU       | Tarifa por hora | Tarifa diaria | Sesión de 4 horas |
| --------- | --------------- | ------------- | ----------------- |
| RTX 3060  | \~$0.03         | \~$0.70       | \~$0.12           |
| RTX 3090  | \~$0.06         | \~$1.50       | \~$0.25           |
| RTX 4090  | \~$0.10         | \~$2.30       | \~$0.40           |
| A100 40GB | \~$0.17         | \~$4.00       | \~$0.70           |
| A100 80GB | \~$0.25         | \~$6.00       | \~$1.00           |

*Los precios varían según el proveedor y la demanda. Consulta* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *para las tarifas actuales.*

**Ahorra dinero:**

* Usa **Spot** market para cargas de trabajo flexibles (a menudo 30-50% más barato)
* Paga con **CLORE** tokens
* Compara precios entre diferentes proveedores

## Próximos pasos

* Inpainting con Stable Diffusion
* [Guía de ControlNet](/guides/guides_v2-es/procesamiento-de-imagenes/controlnet-advanced.md)
* [Escalado Real-ESRGAN](/guides/guides_v2-es/procesamiento-de-imagenes/real-esrgan-upscaling.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-es/procesamiento-de-imagenes/segment-anything.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.