# Multi-Cloud GPU Orchestrator

## What We're Building

A unified GPU orchestration layer that spans Clore.ai and other cloud providers, automatically selecting the most cost-effective option for your workloads. Route jobs to the cheapest available GPUs across multiple providers with a single API.

**Key Features:**

* Unified API across Clore.ai, AWS, GCP, Azure, and Lambda Labs
* Automatic cost optimization and provider selection
* Failover and redundancy across providers
* Consistent job submission interface
* Real-time price comparison
* Workload-aware scheduling
* Provider health monitoring

## Prerequisites

* Accounts on desired cloud providers
* Python 3.10+

```bash
pip install requests boto3 google-cloud-compute azure-mgmt-compute
```

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                    Multi-Cloud Orchestrator                      │
├─────────────────────────────────────────────────────────────────┤
│                     Unified API Layer                            │
│    submit_job() | get_status() | cancel_job() | get_prices()    │
├─────────────────────────────────────────────────────────────────┤
│                    Cost Optimizer                                │
│         Compare prices → Select provider → Route job            │
├────────────┬────────────┬────────────┬────────────┬─────────────┤
│  Clore.ai  │    AWS     │    GCP     │   Azure    │ Lambda Labs │
│  Provider  │  Provider  │  Provider  │  Provider  │   Provider  │
└────────────┴────────────┴────────────┴────────────┴─────────────┘
        │            │            │            │            │
        ▼            ▼            ▼            ▼            ▼
   ┌─────────────────────────────────────────────────────────┐
   │                      GPU Resources                       │
   │   RTX 4090 | A100 | V100 | H100 | RTX 3090 | A6000      │
   └─────────────────────────────────────────────────────────┘
```

## Step 1: Provider Abstraction Layer

```python
# providers/base.py
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Optional, Dict, Any
from enum import Enum

class GPUType(Enum):
    RTX_4090 = "rtx_4090"
    RTX_4080 = "rtx_4080"
    RTX_3090 = "rtx_3090"
    RTX_3080 = "rtx_3080"
    A100_80GB = "a100_80gb"
    A100_40GB = "a100_40gb"
    A6000 = "a6000"
    V100 = "v100"
    H100 = "h100"

@dataclass
class GPUInstance:
    """Represents a GPU instance from any provider."""
    provider: str
    instance_id: str
    gpu_type: GPUType
    gpu_count: int
    price_per_hour: float
    region: str
    status: str
    is_spot: bool = False
    ssh_host: Optional[str] = None
    ssh_port: int = 22
    ssh_user: str = "root"
    metadata: Dict[str, Any] = None

@dataclass
class ProviderCapacity:
    """Available capacity at a provider."""
    provider: str
    gpu_type: GPUType
    available_count: int
    price_per_hour: float
    is_spot: bool
    region: str

@dataclass
class JobRequest:
    """Request to launch a GPU job."""
    gpu_type: GPUType
    gpu_count: int = 1
    image: str = "nvidia/cuda:12.8.0-base-ubuntu22.04"
    command: Optional[str] = None
    env: Dict[str, str] = None
    max_price_per_hour: Optional[float] = None
    prefer_spot: bool = True
    min_runtime_hours: float = 1.0
    preferred_regions: List[str] = None
    
    def __post_init__(self):
        self.env = self.env or {}
        self.preferred_regions = self.preferred_regions or []

@dataclass
class JobResult:
    """Result of launching a job."""
    success: bool
    job_id: str
    provider: str
    instance: Optional[GPUInstance] = None
    error: Optional[str] = None


class CloudProvider(ABC):
    """Abstract base class for cloud providers."""
    
    @property
    @abstractmethod
    def name(self) -> str:
        """Provider name."""
        pass
    
    @abstractmethod
    def get_available_gpus(self) -> List[ProviderCapacity]:
        """Get available GPU capacity."""
        pass
    
    @abstractmethod
    def launch_instance(self, request: JobRequest) -> JobResult:
        """Launch a GPU instance."""
        pass
    
    @abstractmethod
    def terminate_instance(self, instance_id: str) -> bool:
        """Terminate an instance."""
        pass
    
    @abstractmethod
    def get_instance_status(self, instance_id: str) -> Optional[GPUInstance]:
        """Get instance status."""
        pass
    
    @abstractmethod
    def list_instances(self) -> List[GPUInstance]:
        """List all running instances."""
        pass
    
    def is_healthy(self) -> bool:
        """Check if provider API is healthy."""
        try:
            self.get_available_gpus()
            return True
        except Exception:
            return False
```

## Step 2: Clore.ai Provider

```python
# providers/clore.py
import requests
import time
import secrets
from typing import List, Optional
from .base import (
    CloudProvider, GPUInstance, ProviderCapacity, 
    JobRequest, JobResult, GPUType
)

class CloreProvider(CloudProvider):
    """Clore.ai provider implementation."""
    
    BASE_URL = "https://api.clore.ai"
    
    # GPU type mapping
    GPU_MAP = {
        "RTX 4090": GPUType.RTX_4090,
        "RTX 4080": GPUType.RTX_4080,
        "RTX 3090": GPUType.RTX_3090,
        "RTX 3080": GPUType.RTX_3080,
        "A100": GPUType.A100_40GB,
        "A100-80GB": GPUType.A100_80GB,
        "A6000": GPUType.A6000,
    }
    
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.headers = {"auth": api_key}
        self._last_request = 0
    
    @property
    def name(self) -> str:
        return "clore"
    
    def _request(self, method: str, endpoint: str, **kwargs):
        """Make rate-limited API request."""
        now = time.time()
        if now - self._last_request < 1:
            time.sleep(1 - (now - self._last_request))
        self._last_request = time.time()
        
        response = requests.request(
            method,
            f"{self.BASE_URL}{endpoint}",
            headers=self.headers,
            timeout=30,
            **kwargs
        )
        data = response.json()
        
        if data.get("code") != 0:
            raise Exception(f"Clore API Error: {data}")
        
        return data
    
    def _normalize_gpu(self, gpu_name: str) -> Optional[GPUType]:
        """Normalize GPU name to GPUType."""
        for pattern, gpu_type in self.GPU_MAP.items():
            if pattern.lower() in gpu_name.lower():
                return gpu_type
        return None
    
    def get_available_gpus(self) -> List[ProviderCapacity]:
        """Get available GPU capacity from Clore.ai marketplace."""
        data = self._request("GET", "/v1/marketplace")
        
        capacity = {}
        for server in data.get("servers", []):
            if server.get("rented"):
                continue
            
            gpu_array = server.get("gpu_array", [])
            if not gpu_array:
                continue
            
            gpu_type = self._normalize_gpu(gpu_array[0])
            if not gpu_type:
                continue
            
            spot_price = server.get("price", {}).get("usd", {}).get("spot")
            if not spot_price:
                continue
            
            key = (gpu_type, True)  # Clore is effectively all spot
            if key not in capacity:
                capacity[key] = ProviderCapacity(
                    provider="clore",
                    gpu_type=gpu_type,
                    available_count=0,
                    price_per_hour=float('inf'),
                    is_spot=True,
                    region="global"
                )
            
            capacity[key].available_count += len(gpu_array)
            capacity[key].price_per_hour = min(
                capacity[key].price_per_hour,
                spot_price
            )
        
        return list(capacity.values())
    
    def launch_instance(self, request: JobRequest) -> JobResult:
        """Launch a Clore.ai instance."""
        
        # Find matching server
        data = self._request("GET", "/v1/marketplace")
        
        matching_servers = []
        for server in data.get("servers", []):
            if server.get("rented"):
                continue
            
            gpu_array = server.get("gpu_array", [])
            if not gpu_array:
                continue
            
            gpu_type = self._normalize_gpu(gpu_array[0])
            if gpu_type != request.gpu_type:
                continue
            
            if len(gpu_array) < request.gpu_count:
                continue
            
            spot_price = server.get("price", {}).get("usd", {}).get("spot")
            if request.max_price_per_hour and spot_price > request.max_price_per_hour:
                continue
            
            matching_servers.append({
                "id": server["id"],
                "price": spot_price,
                "gpus": gpu_array
            })
        
        if not matching_servers:
            return JobResult(
                success=False,
                job_id="",
                provider="clore",
                error=f"No {request.gpu_type.value} available"
            )
        
        # Select cheapest
        server = min(matching_servers, key=lambda x: x["price"])
        
        # Create order
        ssh_password = secrets.token_urlsafe(16)
        
        order_data = {
            "renting_server": server["id"],
            "type": "spot",
            "currency": "CLORE-Blockchain",
            "image": request.image,
            "ports": {"22": "tcp"},
            "env": {**request.env, "NVIDIA_VISIBLE_DEVICES": "all"},
            "ssh_password": ssh_password,
            "spotprice": server["price"] * 1.1
        }
        
        result = self._request("POST", "/v1/create_order", json=order_data)
        order_id = result["order_id"]
        
        # Wait for ready
        for _ in range(90):
            orders = self._request("GET", "/v1/my_orders").get("orders", [])
            order = next((o for o in orders if o["order_id"] == order_id), None)
            
            if order and order.get("status") == "running":
                conn = order.get("connection", {})
                ssh_str = conn.get("ssh", "")
                
                # Parse SSH connection
                ssh_host = ssh_str.split("@")[1].split()[0] if "@" in ssh_str else ""
                ssh_port = 22
                if "-p" in ssh_str:
                    ssh_port = int(ssh_str.split("-p")[1].strip())
                
                instance = GPUInstance(
                    provider="clore",
                    instance_id=str(order_id),
                    gpu_type=request.gpu_type,
                    gpu_count=len(server["gpus"]),
                    price_per_hour=server["price"],
                    region="global",
                    status="running",
                    is_spot=True,
                    ssh_host=ssh_host,
                    ssh_port=ssh_port,
                    ssh_user="root",
                    metadata={"ssh_password": ssh_password}
                )
                
                return JobResult(
                    success=True,
                    job_id=str(order_id),
                    provider="clore",
                    instance=instance
                )
            
            time.sleep(2)
        
        return JobResult(
            success=False,
            job_id=str(order_id),
            provider="clore",
            error="Timeout waiting for instance"
        )
    
    def terminate_instance(self, instance_id: str) -> bool:
        """Terminate a Clore.ai order."""
        try:
            self._request("POST", "/v1/cancel_order", json={"id": int(instance_id)})
            return True
        except Exception:
            return False
    
    def get_instance_status(self, instance_id: str) -> Optional[GPUInstance]:
        """Get status of a Clore.ai order."""
        orders = self._request("GET", "/v1/my_orders").get("orders", [])
        order = next((o for o in orders if o["order_id"] == int(instance_id)), None)
        
        if not order:
            return None
        
        return GPUInstance(
            provider="clore",
            instance_id=instance_id,
            gpu_type=GPUType.RTX_4090,  # Would need to track this
            gpu_count=1,
            price_per_hour=order.get("price", 0) * 60,
            region="global",
            status=order.get("status", "unknown"),
            is_spot=True
        )
    
    def list_instances(self) -> List[GPUInstance]:
        """List all Clore.ai orders."""
        orders = self._request("GET", "/v1/my_orders").get("orders", [])
        
        return [
            GPUInstance(
                provider="clore",
                instance_id=str(o["order_id"]),
                gpu_type=GPUType.RTX_4090,
                gpu_count=1,
                price_per_hour=o.get("price", 0) * 60,
                region="global",
                status=o.get("status", "unknown"),
                is_spot=True
            )
            for o in orders
            if o.get("status") in ("running", "creating_order")
        ]
```

## Step 3: AWS Provider (Example)

```python
# providers/aws.py
import boto3
from typing import List, Optional
from .base import (
    CloudProvider, GPUInstance, ProviderCapacity,
    JobRequest, JobResult, GPUType
)

class AWSProvider(CloudProvider):
    """AWS EC2 provider for GPU instances."""
    
    # AWS instance type mapping
    GPU_INSTANCES = {
        GPUType.A100_40GB: "p4d.24xlarge",
        GPUType.V100: "p3.2xlarge",
        GPUType.RTX_4090: None,  # Not available on AWS
    }
    
    INSTANCE_PRICES = {
        "p4d.24xlarge": 32.77,
        "p3.2xlarge": 3.06,
        "p3.8xlarge": 12.24,
        "g5.xlarge": 1.006,
        "g5.2xlarge": 1.212,
    }
    
    def __init__(self, region: str = "us-east-1"):
        self.region = region
        self.ec2 = boto3.client("ec2", region_name=region)
    
    @property
    def name(self) -> str:
        return "aws"
    
    def get_available_gpus(self) -> List[ProviderCapacity]:
        """Get AWS GPU capacity (simplified)."""
        # In reality, would query spot prices and capacity
        return [
            ProviderCapacity(
                provider="aws",
                gpu_type=GPUType.A100_40GB,
                available_count=100,  # AWS has plenty
                price_per_hour=32.77,
                is_spot=False,
                region=self.region
            ),
            ProviderCapacity(
                provider="aws",
                gpu_type=GPUType.V100,
                available_count=100,
                price_per_hour=3.06,
                is_spot=False,
                region=self.region
            )
        ]
    
    def launch_instance(self, request: JobRequest) -> JobResult:
        """Launch AWS EC2 GPU instance."""
        instance_type = self.GPU_INSTANCES.get(request.gpu_type)
        
        if not instance_type:
            return JobResult(
                success=False,
                job_id="",
                provider="aws",
                error=f"{request.gpu_type.value} not available on AWS"
            )
        
        # Launch instance (simplified)
        try:
            response = self.ec2.run_instances(
                ImageId="ami-0abcdef1234567890",  # GPU AMI
                InstanceType=instance_type,
                MinCount=1,
                MaxCount=1,
                # ... more config
            )
            
            instance_id = response["Instances"][0]["InstanceId"]
            
            return JobResult(
                success=True,
                job_id=instance_id,
                provider="aws",
                instance=GPUInstance(
                    provider="aws",
                    instance_id=instance_id,
                    gpu_type=request.gpu_type,
                    gpu_count=request.gpu_count,
                    price_per_hour=self.INSTANCE_PRICES.get(instance_type, 0),
                    region=self.region,
                    status="pending",
                    is_spot=False
                )
            )
        except Exception as e:
            return JobResult(
                success=False,
                job_id="",
                provider="aws",
                error=str(e)
            )
    
    def terminate_instance(self, instance_id: str) -> bool:
        try:
            self.ec2.terminate_instances(InstanceIds=[instance_id])
            return True
        except Exception:
            return False
    
    def get_instance_status(self, instance_id: str) -> Optional[GPUInstance]:
        try:
            response = self.ec2.describe_instances(InstanceIds=[instance_id])
            instance = response["Reservations"][0]["Instances"][0]
            
            return GPUInstance(
                provider="aws",
                instance_id=instance_id,
                gpu_type=GPUType.V100,  # Would need to determine
                gpu_count=1,
                price_per_hour=3.06,
                region=self.region,
                status=instance["State"]["Name"],
                is_spot=False,
                ssh_host=instance.get("PublicIpAddress")
            )
        except Exception:
            return None
    
    def list_instances(self) -> List[GPUInstance]:
        # List all GPU instances
        return []
```

## Step 4: Multi-Cloud Orchestrator

```python
# orchestrator.py
from typing import List, Dict, Optional
from dataclasses import dataclass
import logging

from providers.base import (
    CloudProvider, GPUInstance, ProviderCapacity,
    JobRequest, JobResult, GPUType
)
from providers.clore import CloreProvider
from providers.aws import AWSProvider

logger = logging.getLogger(__name__)


@dataclass
class PriceComparison:
    """Price comparison across providers."""
    gpu_type: GPUType
    providers: List[Dict]  # [{provider, price, available, is_spot}]
    cheapest_provider: str
    cheapest_price: float


class MultiCloudOrchestrator:
    """Orchestrate GPU jobs across multiple cloud providers."""
    
    def __init__(self):
        self.providers: Dict[str, CloudProvider] = {}
        self._active_jobs: Dict[str, str] = {}  # job_id -> provider
    
    def add_provider(self, provider: CloudProvider):
        """Add a cloud provider."""
        self.providers[provider.name] = provider
        logger.info(f"Added provider: {provider.name}")
    
    def remove_provider(self, name: str):
        """Remove a cloud provider."""
        if name in self.providers:
            del self.providers[name]
    
    def get_all_capacity(self) -> List[ProviderCapacity]:
        """Get capacity from all providers."""
        all_capacity = []
        
        for name, provider in self.providers.items():
            try:
                capacity = provider.get_available_gpus()
                all_capacity.extend(capacity)
            except Exception as e:
                logger.warning(f"Failed to get capacity from {name}: {e}")
        
        return all_capacity
    
    def compare_prices(self, gpu_type: GPUType) -> PriceComparison:
        """Compare prices across all providers for a GPU type."""
        capacity = self.get_all_capacity()
        
        matching = [c for c in capacity if c.gpu_type == gpu_type]
        
        providers = []
        for c in matching:
            providers.append({
                "provider": c.provider,
                "price": c.price_per_hour,
                "available": c.available_count,
                "is_spot": c.is_spot,
                "region": c.region
            })
        
        # Sort by price
        providers.sort(key=lambda x: x["price"])
        
        return PriceComparison(
            gpu_type=gpu_type,
            providers=providers,
            cheapest_provider=providers[0]["provider"] if providers else "",
            cheapest_price=providers[0]["price"] if providers else float('inf')
        )
    
    def select_provider(self, request: JobRequest) -> Optional[str]:
        """Select best provider for a job request."""
        comparison = self.compare_prices(request.gpu_type)
        
        for p in comparison.providers:
            # Check price limit
            if request.max_price_per_hour and p["price"] > request.max_price_per_hour:
                continue
            
            # Check spot preference
            if request.prefer_spot and not p["is_spot"]:
                # Still consider if much cheaper
                if p["price"] > comparison.cheapest_price * 1.5:
                    continue
            
            # Check availability
            if p["available"] < request.gpu_count:
                continue
            
            return p["provider"]
        
        return None
    
    def submit_job(self, request: JobRequest, provider: str = None) -> JobResult:
        """Submit a job, optionally to a specific provider."""
        
        # Select provider if not specified
        if not provider:
            provider = self.select_provider(request)
        
        if not provider:
            return JobResult(
                success=False,
                job_id="",
                provider="",
                error=f"No provider available for {request.gpu_type.value}"
            )
        
        if provider not in self.providers:
            return JobResult(
                success=False,
                job_id="",
                provider=provider,
                error=f"Provider {provider} not configured"
            )
        
        logger.info(f"Submitting job to {provider}: {request.gpu_type.value}")
        
        # Launch on selected provider
        result = self.providers[provider].launch_instance(request)
        
        if result.success:
            self._active_jobs[result.job_id] = provider
        
        return result
    
    def submit_with_failover(self, request: JobRequest) -> JobResult:
        """Submit job with automatic failover to other providers."""
        
        comparison = self.compare_prices(request.gpu_type)
        
        for p in comparison.providers:
            if request.max_price_per_hour and p["price"] > request.max_price_per_hour:
                continue
            
            provider_name = p["provider"]
            
            if provider_name not in self.providers:
                continue
            
            logger.info(f"Trying provider {provider_name}...")
            result = self.providers[provider_name].launch_instance(request)
            
            if result.success:
                self._active_jobs[result.job_id] = provider_name
                return result
            
            logger.warning(f"Provider {provider_name} failed: {result.error}")
        
        return JobResult(
            success=False,
            job_id="",
            provider="",
            error="All providers failed"
        )
    
    def terminate_job(self, job_id: str) -> bool:
        """Terminate a job."""
        provider_name = self._active_jobs.get(job_id)
        
        if not provider_name:
            logger.warning(f"Unknown job: {job_id}")
            return False
        
        provider = self.providers.get(provider_name)
        if not provider:
            return False
        
        success = provider.terminate_instance(job_id)
        
        if success:
            del self._active_jobs[job_id]
        
        return success
    
    def get_job_status(self, job_id: str) -> Optional[GPUInstance]:
        """Get status of a job."""
        provider_name = self._active_jobs.get(job_id)
        
        if not provider_name:
            return None
        
        provider = self.providers.get(provider_name)
        if not provider:
            return None
        
        return provider.get_instance_status(job_id)
    
    def list_all_jobs(self) -> List[GPUInstance]:
        """List all jobs across all providers."""
        all_jobs = []
        
        for name, provider in self.providers.items():
            try:
                jobs = provider.list_instances()
                all_jobs.extend(jobs)
            except Exception as e:
                logger.warning(f"Failed to list jobs from {name}: {e}")
        
        return all_jobs
    
    def terminate_all_jobs(self) -> int:
        """Terminate all active jobs."""
        count = 0
        
        for job_id in list(self._active_jobs.keys()):
            if self.terminate_job(job_id):
                count += 1
        
        return count
    
    def get_total_cost_per_hour(self) -> float:
        """Get total cost per hour of all active jobs."""
        jobs = self.list_all_jobs()
        return sum(j.price_per_hour for j in jobs if j.status == "running")
```

## Step 5: Complete Multi-Cloud Script

```python
#!/usr/bin/env python3
"""
Multi-Cloud GPU Orchestrator

Usage:
    python multi_cloud.py --action compare --gpu RTX_4090
    python multi_cloud.py --action submit --gpu A100_40GB --max-price 2.0
    python multi_cloud.py --action list
    python multi_cloud.py --action terminate --job-id abc123
"""

import argparse
import json
from orchestrator import MultiCloudOrchestrator, GPUType, JobRequest
from providers.clore import CloreProvider
from providers.aws import AWSProvider


def setup_orchestrator(clore_key: str = None) -> MultiCloudOrchestrator:
    """Set up orchestrator with all providers."""
    orch = MultiCloudOrchestrator()
    
    # Add Clore.ai
    if clore_key:
        orch.add_provider(CloreProvider(clore_key))
    
    # Add AWS (if credentials configured)
    try:
        orch.add_provider(AWSProvider())
    except Exception:
        pass
    
    return orch


def main():
    parser = argparse.ArgumentParser(description="Multi-Cloud GPU Orchestrator")
    parser.add_argument("--action", required=True, 
                       choices=["compare", "submit", "list", "terminate", "status"])
    parser.add_argument("--gpu", help="GPU type (e.g., RTX_4090, A100_40GB)")
    parser.add_argument("--max-price", type=float, help="Max price per hour")
    parser.add_argument("--provider", help="Specific provider to use")
    parser.add_argument("--job-id", help="Job ID for status/terminate")
    parser.add_argument("--clore-key", help="Clore.ai API key")
    parser.add_argument("--image", default="nvidia/cuda:12.8.0-base-ubuntu22.04")
    args = parser.parse_args()
    
    orch = setup_orchestrator(args.clore_key)
    
    if not orch.providers:
        print("❌ No providers configured!")
        return
    
    print(f"✅ Providers: {list(orch.providers.keys())}")
    print()
    
    if args.action == "compare":
        if not args.gpu:
            print("--gpu required for compare")
            return
        
        gpu_type = GPUType[args.gpu.upper()]
        comparison = orch.compare_prices(gpu_type)
        
        print(f"💰 Price Comparison: {gpu_type.value}")
        print("-" * 60)
        
        for p in comparison.providers:
            spot_label = "🟢 Spot" if p["is_spot"] else "🔵 On-Demand"
            print(f"  {p['provider']:10} ${p['price']:.3f}/hr  {p['available']:3} avail  {spot_label}")
        
        print("-" * 60)
        print(f"🏆 Cheapest: {comparison.cheapest_provider} @ ${comparison.cheapest_price:.3f}/hr")
    
    elif args.action == "submit":
        if not args.gpu:
            print("--gpu required for submit")
            return
        
        gpu_type = GPUType[args.gpu.upper()]
        
        request = JobRequest(
            gpu_type=gpu_type,
            gpu_count=1,
            image=args.image,
            max_price_per_hour=args.max_price,
            prefer_spot=True
        )
        
        if args.provider:
            result = orch.submit_job(request, provider=args.provider)
        else:
            result = orch.submit_with_failover(request)
        
        if result.success:
            print(f"✅ Job submitted successfully!")
            print(f"   Job ID: {result.job_id}")
            print(f"   Provider: {result.provider}")
            if result.instance:
                print(f"   SSH: {result.instance.ssh_user}@{result.instance.ssh_host}:{result.instance.ssh_port}")
                print(f"   Price: ${result.instance.price_per_hour:.3f}/hr")
        else:
            print(f"❌ Job failed: {result.error}")
    
    elif args.action == "list":
        jobs = orch.list_all_jobs()
        
        if not jobs:
            print("No active jobs")
            return
        
        print(f"📋 Active Jobs ({len(jobs)})")
        print("-" * 70)
        
        total_cost = 0
        for job in jobs:
            print(f"  {job.instance_id:15} {job.provider:8} {job.gpu_type.value:12} "
                  f"${job.price_per_hour:.3f}/hr  {job.status}")
            if job.status == "running":
                total_cost += job.price_per_hour
        
        print("-" * 70)
        print(f"💵 Total: ${total_cost:.2f}/hr")
    
    elif args.action == "status":
        if not args.job_id:
            print("--job-id required for status")
            return
        
        instance = orch.get_job_status(args.job_id)
        
        if instance:
            print(f"📊 Job Status: {args.job_id}")
            print(f"   Provider: {instance.provider}")
            print(f"   GPU: {instance.gpu_type.value}")
            print(f"   Status: {instance.status}")
            print(f"   Price: ${instance.price_per_hour:.3f}/hr")
            if instance.ssh_host:
                print(f"   SSH: {instance.ssh_user}@{instance.ssh_host}:{instance.ssh_port}")
        else:
            print(f"❌ Job not found: {args.job_id}")
    
    elif args.action == "terminate":
        if not args.job_id:
            # Terminate all
            count = orch.terminate_all_jobs()
            print(f"🛑 Terminated {count} jobs")
        else:
            success = orch.terminate_job(args.job_id)
            if success:
                print(f"✅ Job {args.job_id} terminated")
            else:
                print(f"❌ Failed to terminate {args.job_id}")


if __name__ == "__main__":
    main()
```

## Cost Comparison Table

| GPU Type  | Clore.ai (Spot) | AWS (On-Demand) | GCP (On-Demand) | Lambda Labs |
| --------- | --------------- | --------------- | --------------- | ----------- |
| RTX 4090  | **$0.25-0.50**  | N/A             | N/A             | $0.50       |
| RTX 3090  | **$0.20-0.35**  | N/A             | N/A             | $0.45       |
| A100 40GB | **$1.00-1.50**  | $32.77          | $3.67           | $1.10       |
| A100 80GB | **$1.50-2.00**  | $40.00          | $4.00           | $1.29       |
| V100      | $0.80           | $3.06           | $2.48           | $0.80       |
| H100      | **$2.00-3.00**  | N/A             | $6.98           | $2.49       |

**Clore.ai consistently offers the best prices for consumer GPUs (RTX series)!**

## Next Steps

* [Auto-Scaling Workers](https://docs.clore.ai/dev/inference-and-deployment/auto-scaling-workers)
* [Prometheus Monitoring](https://docs.clore.ai/dev/devops-and-automation/prometheus-monitoring)
* [Cost Optimization](https://docs.clore.ai/dev/devops-and-automation/cost-optimization)
