# Training YOLO Object Detection Models

## What We're Building

A complete YOLOv8 object detection training pipeline on Clore.ai GPUs. Train custom detection, segmentation, and pose estimation models with automatic GPU provisioning, data preparation, and model export.

**Key Features:**

* Automatic GPU provisioning via Clore.ai API
* YOLOv8 detection, segmentation, and pose models
* Custom dataset training (COCO format)
* Data augmentation and preprocessing
* Model export (ONNX, TensorRT, CoreML)
* Training metrics and visualization
* Multi-GPU training support

## Prerequisites

* Clore.ai account with API key ([get one here](https://clore.ai))
* Python 3.10+
* Labeled dataset (YOLO format or COCO format)

```bash
pip install requests paramiko scp ultralytics roboflow
```

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                  YOLOv8 Training Pipeline                        │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  Dataset    │  │  Training   │  │   Export & Deploy       │  │
│  │  Roboflow/  │──│  YOLOv8     │──│   ONNX/TensorRT/CoreML  │  │
│  │  Local      │  │  Ultralytics│  │                         │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
│         │                │                    │                  │
│         └────────────────┴────────────────────┘                  │
│                          │                                       │
│                 ┌────────▼────────┐                              │
│                 │  Clore.ai GPU   │                              │
│                 │  RTX 4090/A100  │                              │
│                 └─────────────────┘                              │
└─────────────────────────────────────────────────────────────────┘
```

## Step 1: Clore.ai YOLO Client

```python
# clore_yolo_client.py
import requests
import time
import secrets
from typing import Dict, Any, Optional
from dataclasses import dataclass

@dataclass
class YOLOServer:
    """GPU server for YOLO training."""
    server_id: int
    order_id: int
    ssh_host: str
    ssh_port: int
    ssh_password: str
    gpu_model: str
    gpu_count: int
    hourly_cost: float


class CloreYOLOClient:
    """Clore.ai client for YOLO training."""
    
    BASE_URL = "https://api.clore.ai"
    YOLO_IMAGE = "ultralytics/ultralytics:latest-python"
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.headers = {"auth": api_key}
    
    def _request(self, method: str, endpoint: str, **kwargs) -> Dict[str, Any]:
        """Make API request."""
        url = f"{self.BASE_URL}{endpoint}"
        
        for attempt in range(3):
            response = requests.request(
                method, url,
                headers=self.headers,
                timeout=30,
                **kwargs
            )
            data = response.json()
            
            if data.get("code") == 5:
                time.sleep(2 ** attempt)
                continue
            
            if data.get("code") != 0:
                raise Exception(f"API Error: {data}")
            return data
        
        raise Exception("Max retries exceeded")
    
    def find_yolo_gpu(self, max_price_usd: float = 0.50) -> Optional[Dict]:
        """Find GPU suitable for YOLO training."""
        servers = self._request("GET", "/v1/marketplace")["servers"]
        
        # GPUs good for YOLO (fast training, reasonable VRAM)
        yolo_gpus = ["RTX 4090", "RTX 4080", "RTX 3090", "RTX 3080",
                     "A100", "A6000", "A5000"]
        
        candidates = []
        for server in servers:
            if server.get("rented"):
                continue
            
            gpu_array = server.get("gpu_array", [])
            if not any(any(g in gpu for g in yolo_gpus) for gpu in gpu_array):
                continue
            
            price = server.get("price", {}).get("usd", {}).get("spot")
            if not price or price > max_price_usd:
                continue
            
            candidates.append({
                "id": server["id"],
                "gpus": gpu_array,
                "gpu_count": len(gpu_array),
                "price_usd": price,
                "reliability": server.get("reliability", 0)
            })
        
        if not candidates:
            return None
        
        candidates.sort(key=lambda x: (x["price_usd"], -x["reliability"]))
        return candidates[0]
    
    def rent_yolo_server(self, server: Dict, use_spot: bool = True) -> YOLOServer:
        """Rent a server for YOLO training."""
        ssh_password = secrets.token_urlsafe(16)
        
        order_data = {
            "renting_server": server["id"],
            "type": "spot" if use_spot else "on-demand",
            "currency": "CLORE-Blockchain",
            "image": self.YOLO_IMAGE,
            "ports": {"22": "tcp", "6006": "http"},
            "env": {"NVIDIA_VISIBLE_DEVICES": "all"},
            "ssh_password": ssh_password
        }
        
        if use_spot:
            order_data["spotprice"] = server["price_usd"] * 1.15
        
        result = self._request("POST", "/v1/create_order", json=order_data)
        order_id = result["order_id"]
        
        # Wait for server
        for _ in range(120):
            orders = self._request("GET", "/v1/my_orders")["orders"]
            order = next((o for o in orders if o["order_id"] == order_id), None)
            
            if order and order.get("status") == "running":
                conn = order["connection"]["ssh"]
                parts = conn.split()
                ssh_host = parts[1].split("@")[1] if "@" in parts[1] else parts[1]
                ssh_port = int(parts[-1]) if "-p" in conn else 22
                
                return YOLOServer(
                    server_id=server["id"],
                    order_id=order_id,
                    ssh_host=ssh_host,
                    ssh_port=ssh_port,
                    ssh_password=ssh_password,
                    gpu_model=server["gpus"][0] if server["gpus"] else "Unknown",
                    gpu_count=server["gpu_count"],
                    hourly_cost=server["price_usd"]
                )
            
            time.sleep(2)
        
        raise Exception("Timeout waiting for server")
    
    def cancel_order(self, order_id: int):
        """Cancel an order."""
        self._request("POST", "/v1/cancel_order", json={"id": order_id})
```

## Step 2: YOLO Training Engine

```python
# yolo_trainer.py
import paramiko
from scp import SCPClient
import json
import time
import os
from typing import Dict, List, Optional
from dataclasses import dataclass

@dataclass
class TrainingConfig:
    """YOLO training configuration."""
    model: str = "yolov8n.pt"  # yolov8n, yolov8s, yolov8m, yolov8l, yolov8x
    task: str = "detect"  # detect, segment, classify, pose
    epochs: int = 100
    batch_size: int = 16
    img_size: int = 640
    learning_rate: float = 0.01
    device: str = "0"
    workers: int = 8
    patience: int = 50
    project: str = "yolo_training"
    name: str = "run"


@dataclass
class TrainingResult:
    """Training results."""
    model_path: str
    metrics: Dict
    training_time_seconds: float
    epochs_completed: int
    success: bool
    error: Optional[str] = None


class RemoteYOLOTrainer:
    """Execute YOLO training on remote GPU."""
    
    def __init__(self, ssh_host: str, ssh_port: int, ssh_password: str):
        self.ssh_host = ssh_host
        self.ssh_port = ssh_port
        self.ssh_password = ssh_password
        self._ssh = None
        self._scp = None
    
    def connect(self):
        """Establish SSH connection."""
        self._ssh = paramiko.SSHClient()
        self._ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        self._ssh.connect(
            self.ssh_host,
            port=self.ssh_port,
            username="root",
            password=self.ssh_password,
            timeout=30
        )
        self._scp = SCPClient(self._ssh.get_transport())
    
    def disconnect(self):
        """Close connections."""
        if self._scp:
            self._scp.close()
        if self._ssh:
            self._ssh.close()
    
    def _exec(self, cmd: str, timeout: int = 7200) -> str:
        """Execute command."""
        stdin, stdout, stderr = self._ssh.exec_command(cmd, timeout=timeout)
        stdout.channel.recv_exit_status()
        return stdout.read().decode()
    
    def upload_dataset(self, local_path: str, dataset_name: str = "dataset"):
        """Upload dataset to server."""
        remote_path = f"/tmp/{dataset_name}"
        self._exec(f"mkdir -p {remote_path}")
        self._scp.put(local_path, remote_path, recursive=True)
        return remote_path
    
    def upload_file(self, local_path: str, remote_path: str):
        """Upload single file."""
        self._scp.put(local_path, remote_path)
    
    def download_file(self, remote_path: str, local_path: str):
        """Download file."""
        self._scp.get(remote_path, local_path)
    
    def download_directory(self, remote_path: str, local_path: str):
        """Download directory."""
        self._scp.get(remote_path, local_path, recursive=True)
    
    def setup_environment(self):
        """Ensure YOLOv8 is installed."""
        print("Setting up environment...")
        self._exec("pip install -q ultralytics")
        self._exec("mkdir -p /tmp/yolo_training")
    
    def verify_gpu(self) -> Dict:
        """Verify GPU availability."""
        script = '''
import torch
from ultralytics import YOLO

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"Device name: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
'''
        output = self._exec(f"python3 -c '{script}'")
        return {"output": output}
    
    def train(self, dataset_yaml: str, config: TrainingConfig) -> TrainingResult:
        """Train YOLO model."""
        
        training_script = f'''
import json
import time
from ultralytics import YOLO

start_time = time.time()
result = {{"success": False}}

try:
    # Load model
    model = YOLO("{config.model}")
    
    # Train
    results = model.train(
        data="{dataset_yaml}",
        epochs={config.epochs},
        batch={config.batch_size},
        imgsz={config.img_size},
        lr0={config.learning_rate},
        device="{config.device}",
        workers={config.workers},
        patience={config.patience},
        project="/tmp/{config.project}",
        name="{config.name}",
        exist_ok=True,
        verbose=True
    )
    
    # Get best model path
    best_model = f"/tmp/{config.project}/{config.name}/weights/best.pt"
    
    # Validate
    metrics = model.val()
    
    result = {{
        "success": True,
        "model_path": best_model,
        "metrics": {{
            "mAP50": float(metrics.box.map50) if hasattr(metrics.box, 'map50') else 0,
            "mAP50_95": float(metrics.box.map) if hasattr(metrics.box, 'map') else 0,
            "precision": float(metrics.box.mp) if hasattr(metrics.box, 'mp') else 0,
            "recall": float(metrics.box.mr) if hasattr(metrics.box, 'mr') else 0
        }},
        "epochs_completed": {config.epochs},
        "training_time": time.time() - start_time
    }}
    
except Exception as e:
    result = {{"success": False, "error": str(e)}}

print("RESULT:" + json.dumps(result))
'''
        
        # Write script
        self._exec(f"cat > /tmp/train_yolo.py << 'EOF'\n{training_script}\nEOF")
        
        # Run training
        print(f"Training {config.model} for {config.epochs} epochs...")
        output = self._exec("python3 /tmp/train_yolo.py 2>&1", timeout=86400)
        
        # Parse result
        for line in output.split("\n"):
            if line.startswith("RESULT:"):
                result_data = json.loads(line[7:])
                return TrainingResult(
                    model_path=result_data.get("model_path", ""),
                    metrics=result_data.get("metrics", {}),
                    training_time_seconds=result_data.get("training_time", 0),
                    epochs_completed=result_data.get("epochs_completed", 0),
                    success=result_data.get("success", False),
                    error=result_data.get("error")
                )
        
        return TrainingResult(
            model_path="",
            metrics={},
            training_time_seconds=0,
            epochs_completed=0,
            success=False,
            error="Failed to parse training result"
        )
    
    def export_model(self, model_path: str, format: str = "onnx") -> str:
        """Export model to different format."""
        export_script = f'''
from ultralytics import YOLO
model = YOLO("{model_path}")
path = model.export(format="{format}")
print(f"EXPORTED:{path}")
'''
        output = self._exec(f"python3 -c '{export_script}'")
        
        for line in output.split("\n"):
            if line.startswith("EXPORTED:"):
                return line[9:]
        
        return ""
```

## Step 3: Complete YOLO Training Pipeline

```python
# yolo_pipeline.py
import os
import time
import yaml
from typing import Optional
from dataclasses import asdict

from clore_yolo_client import CloreYOLOClient, YOLOServer
from yolo_trainer import RemoteYOLOTrainer, TrainingConfig, TrainingResult


class YOLOPipeline:
    """End-to-end YOLO training pipeline on Clore.ai."""
    
    def __init__(self, api_key: str):
        self.client = CloreYOLOClient(api_key)
        self.server: YOLOServer = None
        self.trainer: RemoteYOLOTrainer = None
    
    def setup(self, max_price_usd: float = 0.50):
        """Provision GPU for YOLO training."""
        
        print("🔍 Finding GPU for YOLO training...")
        gpu = self.client.find_yolo_gpu(max_price_usd=max_price_usd)
        
        if not gpu:
            raise Exception(f"No GPU available under ${max_price_usd}/hr")
        
        print(f"   Found: {gpu['gpus']} @ ${gpu['price_usd']:.2f}/hr")
        
        print("🚀 Provisioning server...")
        self.server = self.client.rent_yolo_server(gpu)
        
        print(f"   Server ready: {self.server.ssh_host}:{self.server.ssh_port}")
        
        # Connect trainer
        self.trainer = RemoteYOLOTrainer(
            self.server.ssh_host,
            self.server.ssh_port,
            self.server.ssh_password
        )
        self.trainer.connect()
        self.trainer.setup_environment()
        self.trainer.verify_gpu()
        
        return self
    
    def prepare_dataset(self, 
                        images_path: str,
                        labels_path: str,
                        classes: list,
                        val_split: float = 0.2) -> str:
        """Prepare and upload dataset."""
        
        # Create dataset.yaml
        dataset_yaml = {
            "path": "/tmp/dataset",
            "train": "images/train",
            "val": "images/val",
            "names": {i: name for i, name in enumerate(classes)}
        }
        
        # Write YAML locally
        yaml_path = "/tmp/dataset.yaml"
        with open(yaml_path, "w") as f:
            yaml.dump(dataset_yaml, f)
        
        # Upload dataset
        print("📤 Uploading dataset...")
        self.trainer.upload_dataset(images_path, "dataset/images")
        self.trainer.upload_dataset(labels_path, "dataset/labels")
        self.trainer.upload_file(yaml_path, "/tmp/dataset.yaml")
        
        return "/tmp/dataset.yaml"
    
    def train(self, 
              dataset_yaml: str,
              model: str = "yolov8n.pt",
              epochs: int = 100,
              batch_size: int = 16,
              img_size: int = 640) -> TrainingResult:
        """Train YOLO model."""
        
        config = TrainingConfig(
            model=model,
            epochs=epochs,
            batch_size=batch_size,
            img_size=img_size
        )
        
        return self.trainer.train(dataset_yaml, config)
    
    def export(self, model_path: str, format: str = "onnx") -> str:
        """Export trained model."""
        return self.trainer.export_model(model_path, format)
    
    def download_model(self, remote_path: str, local_path: str):
        """Download trained model."""
        self.trainer.download_file(remote_path, local_path)
    
    def download_training_results(self, local_dir: str):
        """Download all training results."""
        self.trainer.download_directory("/tmp/yolo_training", local_dir)
    
    def cleanup(self):
        """Release resources."""
        if self.trainer:
            self.trainer.disconnect()
        if self.server:
            print("🧹 Releasing server...")
            self.client.cancel_order(self.server.order_id)
    
    def __enter__(self):
        return self
    
    def __exit__(self, *args):
        self.cleanup()
```

## Full Script: Production YOLO Training

```python
#!/usr/bin/env python3
"""
YOLOv8 Training on Clore.ai GPUs.

Usage:
    # Train with local dataset
    python train_yolo.py --api-key YOUR_API_KEY --data dataset.yaml --model yolov8s.pt --epochs 100
    
    # Train with Roboflow dataset
    python train_yolo.py --api-key YOUR_API_KEY --roboflow WORKSPACE/PROJECT/VERSION --model yolov8m.pt
"""

import argparse
import os
import time
import json
import secrets
import requests
import paramiko
from scp import SCPClient
from typing import Dict, Optional
from dataclasses import dataclass


@dataclass
class TrainingResult:
    model_path: str
    mAP50: float
    mAP50_95: float
    precision: float
    recall: float
    epochs: int
    time_seconds: float
    cost_usd: float
    success: bool


class CloreYOLOTrainer:
    """Complete YOLOv8 training on Clore.ai."""
    
    BASE_URL = "https://api.clore.ai"
    IMAGE = "ultralytics/ultralytics:latest-python"
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.headers = {"auth": api_key}
        self.order_id = None
        self.ssh_host = None
        self.ssh_port = None
        self.ssh_password = None
        self.hourly_cost = 0.0
        self._ssh = None
        self._scp = None
    
    def _api(self, method: str, endpoint: str, **kwargs) -> Dict:
        url = f"{self.BASE_URL}{endpoint}"
        for attempt in range(3):
            response = requests.request(method, url, headers=self.headers, **kwargs)
            data = response.json()
            if data.get("code") == 5:
                time.sleep(2 ** attempt)
                continue
            if data.get("code") != 0:
                raise Exception(f"API Error: {data}")
            return data
        raise Exception("Max retries")
    
    def setup(self, max_price: float = 0.50):
        print("🔍 Finding GPU...")
        servers = self._api("GET", "/v1/marketplace")["servers"]
        
        gpus = ["RTX 4090", "RTX 4080", "RTX 3090", "RTX 3080", "A100"]
        candidates = []
        
        for s in servers:
            if s.get("rented"):
                continue
            gpu_array = s.get("gpu_array", [])
            if not any(any(g in gpu for g in gpus) for gpu in gpu_array):
                continue
            price = s.get("price", {}).get("usd", {}).get("spot")
            if price and price <= max_price:
                candidates.append({"id": s["id"], "gpus": gpu_array, "price": price})
        
        if not candidates:
            raise Exception(f"No GPU under ${max_price}/hr")
        
        gpu = min(candidates, key=lambda x: x["price"])
        print(f"   {gpu['gpus']} @ ${gpu['price']:.2f}/hr")
        
        self.ssh_password = secrets.token_urlsafe(16)
        self.hourly_cost = gpu["price"]
        
        print("🚀 Provisioning server...")
        order_data = {
            "renting_server": gpu["id"],
            "type": "spot",
            "currency": "CLORE-Blockchain",
            "image": self.IMAGE,
            "ports": {"22": "tcp"},
            "env": {"NVIDIA_VISIBLE_DEVICES": "all"},
            "ssh_password": self.ssh_password,
            "spotprice": gpu["price"] * 1.15
        }
        
        result = self._api("POST", "/v1/create_order", json=order_data)
        self.order_id = result["order_id"]
        
        print("⏳ Waiting for server...")
        for _ in range(120):
            orders = self._api("GET", "/v1/my_orders")["orders"]
            order = next((o for o in orders if o["order_id"] == self.order_id), None)
            if order and order.get("status") == "running":
                conn = order["connection"]["ssh"]
                parts = conn.split()
                self.ssh_host = parts[1].split("@")[1]
                self.ssh_port = int(parts[-1]) if "-p" in conn else 22
                break
            time.sleep(2)
        else:
            raise Exception("Timeout")
        
        # Connect SSH
        self._ssh = paramiko.SSHClient()
        self._ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
        self._ssh.connect(self.ssh_host, port=self.ssh_port,
                          username="root", password=self.ssh_password, timeout=30)
        self._scp = SCPClient(self._ssh.get_transport())
        
        print(f"✅ Server ready: {self.ssh_host}:{self.ssh_port}")
        
        # Setup YOLO
        print("📦 Setting up YOLOv8...")
        self._exec("pip install -q ultralytics", timeout=120)
    
    def _exec(self, cmd: str, timeout: int = 86400) -> str:
        stdin, stdout, stderr = self._ssh.exec_command(cmd, timeout=timeout)
        stdout.channel.recv_exit_status()
        return stdout.read().decode()
    
    def upload_dataset(self, local_path: str) -> str:
        """Upload local dataset."""
        print(f"📤 Uploading dataset from {local_path}...")
        remote_path = "/tmp/dataset"
        self._exec(f"mkdir -p {remote_path}")
        self._scp.put(local_path, remote_path, recursive=True)
        return remote_path
    
    def download_roboflow(self, workspace: str, project: str, version: int, api_key: str) -> str:
        """Download dataset from Roboflow."""
        print(f"📥 Downloading from Roboflow: {workspace}/{project}/v{version}")
        
        script = f'''
from roboflow import Roboflow
rf = Roboflow(api_key="{api_key}")
project = rf.workspace("{workspace}").project("{project}")
dataset = project.version({version}).download("yolov8", location="/tmp/dataset")
print("DONE:/tmp/dataset/data.yaml")
'''
        output = self._exec(f"python3 -c '{script}'", timeout=600)
        
        for line in output.split("\n"):
            if line.startswith("DONE:"):
                return line[5:]
        
        return "/tmp/dataset/data.yaml"
    
    def train(self, data_yaml: str, model: str = "yolov8n.pt", epochs: int = 100,
              batch: int = 16, imgsz: int = 640) -> TrainingResult:
        
        script = f'''
import json
import time
from ultralytics import YOLO

start = time.time()
result = {{"success": False}}

try:
    model = YOLO("{model}")
    results = model.train(
        data="{data_yaml}",
        epochs={epochs},
        batch={batch},
        imgsz={imgsz},
        device=0,
        project="/tmp/runs",
        name="train",
        exist_ok=True,
        verbose=True
    )
    
    metrics = model.val()
    
    result = {{
        "success": True,
        "model_path": "/tmp/runs/train/weights/best.pt",
        "mAP50": float(metrics.box.map50) if hasattr(metrics.box, 'map50') else 0,
        "mAP50_95": float(metrics.box.map) if hasattr(metrics.box, 'map') else 0,
        "precision": float(metrics.box.mp) if hasattr(metrics.box, 'mp') else 0,
        "recall": float(metrics.box.mr) if hasattr(metrics.box, 'mr') else 0,
        "epochs": {epochs},
        "time": time.time() - start
    }}
except Exception as e:
    result = {{"success": False, "error": str(e)}}

print("RESULT:" + json.dumps(result))
'''
        
        self._exec(f"cat > /tmp/train.py << 'EOF'\n{script}\nEOF")
        
        print(f"🎯 Training {model} for {epochs} epochs...")
        start = time.time()
        output = self._exec("python3 /tmp/train.py 2>&1", timeout=86400)
        elapsed = time.time() - start
        
        # Parse result
        result_data = {"success": False}
        for line in output.split("\n"):
            if line.startswith("RESULT:"):
                result_data = json.loads(line[7:])
                break
        
        cost = (elapsed / 3600) * self.hourly_cost
        
        return TrainingResult(
            model_path=result_data.get("model_path", ""),
            mAP50=result_data.get("mAP50", 0),
            mAP50_95=result_data.get("mAP50_95", 0),
            precision=result_data.get("precision", 0),
            recall=result_data.get("recall", 0),
            epochs=result_data.get("epochs", 0),
            time_seconds=elapsed,
            cost_usd=cost,
            success=result_data.get("success", False)
        )
    
    def export(self, model_path: str, format: str = "onnx") -> str:
        """Export model to different format."""
        script = f'''
from ultralytics import YOLO
model = YOLO("{model_path}")
path = model.export(format="{format}")
print(f"EXPORTED:{{path}}")
'''
        output = self._exec(f"python3 -c '{script}'", timeout=600)
        
        for line in output.split("\n"):
            if line.startswith("EXPORTED:"):
                return line[9:]
        return ""
    
    def download_model(self, remote_path: str, local_path: str):
        """Download model file."""
        os.makedirs(os.path.dirname(local_path) or ".", exist_ok=True)
        self._scp.get(remote_path, local_path)
    
    def cleanup(self):
        if self._scp:
            self._scp.close()
        if self._ssh:
            self._ssh.close()
        if self.order_id:
            print("🧹 Releasing server...")
            self._api("POST", "/v1/cancel_order", json={"id": self.order_id})
    
    def __enter__(self):
        return self
    
    def __exit__(self, *args):
        self.cleanup()


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--api-key", required=True, help="Clore.ai API key")
    parser.add_argument("--data", help="Local dataset path or dataset.yaml")
    parser.add_argument("--roboflow", help="Roboflow dataset (WORKSPACE/PROJECT/VERSION)")
    parser.add_argument("--roboflow-key", help="Roboflow API key")
    parser.add_argument("--model", default="yolov8n.pt", help="Base model")
    parser.add_argument("--epochs", type=int, default=100)
    parser.add_argument("--batch", type=int, default=16)
    parser.add_argument("--imgsz", type=int, default=640)
    parser.add_argument("--output", default="./best.pt")
    parser.add_argument("--export", choices=["onnx", "torchscript", "tflite", "coreml"])
    parser.add_argument("--max-price", type=float, default=0.50)
    args = parser.parse_args()
    
    with CloreYOLOTrainer(args.api_key) as trainer:
        trainer.setup(args.max_price)
        
        # Get dataset
        if args.roboflow:
            parts = args.roboflow.split("/")
            workspace, project, version = parts[0], parts[1], int(parts[2])
            data_yaml = trainer.download_roboflow(workspace, project, version, args.roboflow_key)
        elif args.data:
            if os.path.isdir(args.data):
                trainer.upload_dataset(args.data)
                data_yaml = "/tmp/dataset/data.yaml"
            else:
                trainer._scp.put(args.data, "/tmp/data.yaml")
                data_yaml = "/tmp/data.yaml"
        else:
            # Use COCO128 for demo
            data_yaml = "coco128.yaml"
        
        # Train
        result = trainer.train(data_yaml, args.model, args.epochs, args.batch, args.imgsz)
        
        print("\n" + "="*60)
        print("📊 TRAINING COMPLETE")
        print("="*60)
        print(f"   Model: {args.model}")
        print(f"   Epochs: {result.epochs}")
        print(f"   Time: {result.time_seconds:.1f}s ({result.time_seconds/60:.1f} min)")
        print(f"   Cost: ${result.cost_usd:.4f}")
        print(f"\n📈 Metrics:")
        print(f"   mAP50: {result.mAP50:.4f}")
        print(f"   mAP50-95: {result.mAP50_95:.4f}")
        print(f"   Precision: {result.precision:.4f}")
        print(f"   Recall: {result.recall:.4f}")
        
        if result.success and result.model_path:
            # Download model
            trainer.download_model(result.model_path, args.output)
            print(f"\n✅ Model saved: {args.output}")
            
            # Export if requested
            if args.export:
                print(f"\n📦 Exporting to {args.export}...")
                exported = trainer.export(result.model_path, args.export)
                if exported:
                    export_local = args.output.replace(".pt", f".{args.export}")
                    trainer.download_model(exported, export_local)
                    print(f"   Exported: {export_local}")


if __name__ == "__main__":
    main()
```

## Example Training Commands

```bash
# Train YOLOv8 nano on COCO128 (demo)
python train_yolo.py --api-key YOUR_KEY --model yolov8n.pt --epochs 50

# Train YOLOv8 small on custom dataset
python train_yolo.py --api-key YOUR_KEY --data ./my_dataset --model yolov8s.pt --epochs 100

# Train from Roboflow dataset
python train_yolo.py --api-key YOUR_KEY \
    --roboflow myworkspace/myproject/1 \
    --roboflow-key RF_API_KEY \
    --model yolov8m.pt --epochs 150

# Train and export to ONNX
python train_yolo.py --api-key YOUR_KEY --data dataset.yaml \
    --model yolov8l.pt --epochs 200 --export onnx
```

## Model Variants Comparison

| Model   | Size   | mAP50 | Speed (V100) | Clore.ai Cost (100 epochs) |
| ------- | ------ | ----- | ------------ | -------------------------- |
| YOLOv8n | 3.2MB  | 37.3  | 1.2ms        | **\~$0.15**                |
| YOLOv8s | 11.2MB | 44.9  | 2.0ms        | **\~$0.25**                |
| YOLOv8m | 25.9MB | 50.2  | 3.5ms        | **\~$0.40**                |
| YOLOv8l | 43.7MB | 52.9  | 5.5ms        | **\~$0.60**                |
| YOLOv8x | 68.2MB | 53.9  | 8.5ms        | **\~$0.80**                |

## Cost Comparison

| Platform         | RTX 4090 | 100 epochs COCO | Cost      |
| ---------------- | -------- | --------------- | --------- |
| **Clore.ai**     | $0.35/hr | \~45 min        | **$0.26** |
| AWS p3.2xlarge   | $3.06/hr | \~90 min        | $4.59     |
| Google Colab Pro | $10/mo   | \~60 min        | Limited   |
| Lambda Labs      | $1.10/hr | \~45 min        | $0.83     |

## Next Steps

* [Reinforcement Learning](https://docs.clore.ai/dev/machine-learning-and-training/reinforcement-learning)
* [Training Scheduler](https://docs.clore.ai/dev/machine-learning-and-training/training-scheduler)
* [Batch Inference](https://docs.clore.ai/dev/inference-and-deployment/batch-inference)