# HuggingFace Transformers

Usa la biblioteca Transformers para NLP, visión y audio en GPU.

{% hint style="success" %}
Todos los ejemplos se pueden ejecutar en servidores GPU alquilados a través de [CLORE.AI Marketplace](https://clore.ai/marketplace).
{% endhint %}

## Alquilar en CLORE.AI

1. Visita [CLORE.AI Marketplace](https://clore.ai/marketplace)
2. Filtrar por tipo de GPU, VRAM y precio
3. Elegir **Bajo demanda** (tarifa fija) o **Spot** (precio de puja)
4. Configura tu pedido:
   * Selecciona imagen Docker
   * Establece puertos (TCP para SSH, HTTP para interfaces web)
   * Agrega variables de entorno si es necesario
   * Introduce el comando de inicio
5. Selecciona pago: **CLORE**, **BTC**, o **USDT/USDC**
6. Crea el pedido y espera el despliegue

### Accede a tu servidor

* Encuentra los detalles de conexión en **Mis Pedidos**
* Interfaces web: Usa la URL del puerto HTTP
* SSH: `ssh -p <port> root@<proxy-address>`

## ¿Qué es Transformers?

Hugging Face Transformers proporciona:

* Más de 100.000 modelos preentrenados
* Carga e inferencia de modelos fácil
* Soporte para fine-tuning
* Capacidades multimodales

## Despliegue rápido

**Imagen Docker:**

```
pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
```

**Puertos:**

```
22/tcp
```

**Comando:**

```bash
pip install transformers accelerate datasets huggingface_hub
```

## Instalación

```bash
pip install transformers[torch]
pip install accelerate  # Para modelos grandes
pip install datasets    # Para datos de entrenamiento
```

## Generación de texto

### Generación básica

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "Explica la computación cuántica en términos sencillos:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Modelos de chat

```python
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="meta-llama/Llama-2-7b-chat-hf",
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "¿Qué es el aprendizaje automático?"}
]

outputs = pipe(
    messages,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7
)

print(outputs[0]["generated_text"][-1]["content"])
```

### Streaming

```python
from transformers import TextStreamer

streamer = TextStreamer(tokenizer)

model.generate(
    **inputs,
    max_new_tokens=200,
    streamer=streamer
)
```

## Cuantización

### Cuantización a 4 bits

```python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-13b-hf",
    quantization_config=bnb_config,
    device_map="auto"
)
```

### Cuantización a 8 bits

```python
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    load_in_8bit=True,
    device_map="auto"
)
```

## Embeddings

```python
from transformers import AutoModel, AutoTokenizer
import torch

model_name = "sentence-transformers/all-MiniLM-L6-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name).cuda()

def get_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to("cuda")
    with torch.no_grad():
        outputs = model(**inputs)
    # Pooling por media
    embedding = outputs.last_hidden_state.mean(dim=1)
    return embedding

emb = get_embedding("¡Hola, mundo!")
print(f"Forma del embedding: {emb.shape}")
```

## Clasificación de imágenes

```python
from transformers import pipeline
from PIL import Image

classifier = pipeline("image-classification", model="google/vit-base-patch16-224", device=0)

image = Image.open("cat.jpg")
results = classifier(image)

for result in results:
    print(f"{result['label']}: {result['score']:.4f}")
```

## Detección de objetos

```python
from transformers import pipeline
from PIL import Image

detector = pipeline("object-detection", model="facebook/detr-resnet-50", device=0)

image = Image.open("street.jpg")
results = detector(image)

for result in results:
    print(f"{result['label']}: {result['score']:.4f} en {result['box']}")
```

## Segmentación de imágenes

```python
from transformers import pipeline
from PIL import Image

segmenter = pipeline("image-segmentation", model="facebook/maskformer-swin-base-ade", device=0)

image = Image.open("scene.jpg")
results = segmenter(image)

for segment in results:
    print(f"{segment['label']}: puntuación {segment['score']:.4f}")
```

## Reconocimiento de voz

```python
from transformers import pipeline

transcriber = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-large-v3",
    device=0
)

result = transcriber("audio.mp3")
print(result["text"])
```

## Texto a voz

```python
from transformers import pipeline
import scipy

synthesizer = pipeline("text-to-speech", model="microsoft/speecht5_tts", device=0)

speech = synthesizer("Hola, esto es una prueba de texto a voz.")

scipy.io.wavfile.write("output.wav", rate=speech["sampling_rate"], data=speech["audio"])
```

## Fine-Tuning

### Preparar conjunto de datos

```python
from datasets import load_dataset

dataset = load_dataset("imdb")
train_dataset = dataset["train"].select(range(1000))
eval_dataset = dataset["test"].select(range(200))
```

### Entrenamiento

```python
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer
)

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_train = train_dataset.map(tokenize_function, batched=True)
tokenized_eval = eval_dataset.map(tokenize_function, batched=True)

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    fp16=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
)

trainer.train()
```

## Tareas de Pipeline

| Tarea                     | Nombre del Pipeline            |
| ------------------------- | ------------------------------ |
| Generación de texto       | `text-generation`              |
| Rellenar máscara          | `fill-mask`                    |
| Resumen                   | `summarization`                |
| Traducción                | `translation`                  |
| Pregunta y respuesta      | `question-answering`           |
| Análisis de sentimiento   | `sentiment-analysis`           |
| Clasificación de imágenes | `image-classification`         |
| Detección de objetos      | `object-detection`             |
| Reconocimiento de voz     | `automatic-speech-recognition` |
| Texto a voz               | `text-to-speech`               |

## Multi-GPU

```python
from transformers import AutoModelForCausalLM

# Colocación automática de dispositivos
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-70b-hf",
    device_map="auto",
    torch_dtype=torch.float16
)

# Mapa de dispositivos manual
device_map = {
    "model.embed_tokens": 0,
    "model.layers.0": 0,
    "model.layers.1": 0,
    # ...
    "model.layers.39": 1,
    "model.norm": 1,
    "lm_head": 1
}

model = AutoModelForCausalLM.from_pretrained(
    "model_name",
    device_map=device_map
)
```

## Optimización de memoria

```python

# Flash Attention 2
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    torch_dtype=torch.float16,
    attn_implementation="flash_attention_2",
    device_map="auto"
)

# Gradient checkpointing
model.gradient_checkpointing_enable()
```

## Model Hub

### Descargar modelos

```python
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="meta-llama/Llama-2-7b-hf",
    local_dir="./llama-2-7b"
)
```

### Subir modelos

```python
from huggingface_hub import HfApi

api = HfApi()
api.upload_folder(
    folder_path="./my_model",
    repo_id="username/my-model",
    repo_type="model"
)
```

## Consejos de rendimiento

| Consejo                         | Efecto                 |
| ------------------------------- | ---------------------- |
| Usa `torch_dtype=torch.float16` | 50% menos de memoria   |
| Habilitar Flash Attention 2     | Atención 2x más rápida |
| Usar cuantización               | 75% menos de memoria   |
| Inferencia por lotes            | Mayor rendimiento      |

## Solución de problemas

## Estimación de costos

Tarifas típicas del marketplace de CLORE.AI (a fecha de 2024):

| GPU       | Tarifa por hora | Tarifa diaria | Sesión de 4 horas |
| --------- | --------------- | ------------- | ----------------- |
| RTX 3060  | \~$0.03         | \~$0.70       | \~$0.12           |
| RTX 3090  | \~$0.06         | \~$1.50       | \~$0.25           |
| RTX 4090  | \~$0.10         | \~$2.30       | \~$0.40           |
| A100 40GB | \~$0.17         | \~$4.00       | \~$0.70           |
| A100 80GB | \~$0.25         | \~$6.00       | \~$1.00           |

*Los precios varían según el proveedor y la demanda. Consulta* [*CLORE.AI Marketplace*](https://clore.ai/marketplace) *para las tarifas actuales.*

**Ahorra dinero:**

* Usa **Spot** market para cargas de trabajo flexibles (a menudo 30-50% más barato)
* Paga con **CLORE** tokens
* Compara precios entre diferentes proveedores

## Próximos pasos

* Inferencia vLLM - Servicio en producción
* [Ajustar LLMs](/guides/guides_v2-es/entrenamiento/finetune-llm.md) - Entrenamiento LoRA
* [Entrenamiento con DeepSpeed](/guides/guides_v2-es/entrenamiento/deepspeed-training.md) - Entrenamiento distribuido


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/guides_v2-es/entrenamiento/huggingface-transformers.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
