# CI/CD with clore-ai SDK

Integrate GPU testing and deployment into your CI/CD pipelines. This chapter covers GitHub Actions, GitLab CI, Docker, and secrets management — with full working configs.

***

## Secrets Management

Before configuring any pipeline, store your Clore API key securely.

### GitHub Actions

1. Go to your repo → **Settings → Secrets and variables → Actions**
2. Click **New repository secret**
3. Name: `CLORE_API_KEY`, Value: your API key

### GitLab CI

1. Go to your project → **Settings → CI/CD → Variables**
2. Add variable: Key = `CLORE_API_KEY`, Value = your API key
3. Check **Mask variable** and **Protect variable**

### General Rules

* **Never** hardcode API keys in source code or CI configs
* Use environment variables or secrets managers
* Rotate keys periodically
* Restrict key scope: use a dedicated API key for CI (not your main account key)

***

## GitHub Actions

### Basic: GPU Smoke Test

Run `nvidia-smi` on a Clore GPU on every push to `main`.

```yaml
# .github/workflows/gpu-test.yml
name: GPU Smoke Test

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  CLORE_API_KEY: ${{ secrets.CLORE_API_KEY }}

jobs:
  gpu-test:
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install SDK
        run: pip install clore-ai

      - name: Run GPU test
        run: |
          python << 'EOF'
          import time
          from clore_ai import CloreAI
          from clore_ai.exceptions import CloreAPIError

          client = CloreAI()
          order_id = None

          try:
              # Find cheapest GPU
              servers = client.marketplace(max_price_usd=1.0)
              servers.sort(key=lambda s: s.price_usd or float("inf"))

              if not servers:
                  print("::warning::No GPU servers available")
                  exit(0)

              best = servers[0]
              print(f"Using server {best.id}: {best.gpu_model} @ ${best.price_usd:.4f}/h")

              # Create order
              order = client.create_order(
                  server_id=best.id,
                  image="cloreai/ubuntu22.04-cuda12",
                  type="on-demand",
                  currency="bitcoin",
                  ssh_password="CITest123",
                  ports={"22": "tcp"},
              )
              order_id = order.id
              print(f"Order {order_id} created")

              # Wait for instance (poll for IP)
              for _ in range(24):  # 2 minutes
                  time.sleep(5)
                  orders = client.my_orders()
                  active = next((o for o in orders if o.id == order_id), None)
                  if active and active.pub_cluster:
                      print(f"Instance ready: {active.pub_cluster}")
                      break
              else:
                  print("::error::Instance did not start in time")
                  exit(1)

              print("✅ GPU test passed")

          except CloreAPIError as e:
              print(f"::error::Clore API error: {e}")
              exit(1)

          finally:
              if order_id:
                  try:
                      client.cancel_order(order_id, issue="CI test complete")
                      print(f"Order {order_id} cancelled")
                  except Exception:
                      pass
          EOF
```

### Advanced: Matrix GPU Testing

Test your code on multiple GPU types in parallel.

```yaml
# .github/workflows/gpu-matrix.yml
name: GPU Matrix Test

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  CLORE_API_KEY: ${{ secrets.CLORE_API_KEY }}

jobs:
  gpu-test:
    runs-on: ubuntu-latest
    timeout-minutes: 20

    strategy:
      fail-fast: false
      matrix:
        gpu: ["RTX 4090", "RTX 3090", "A100"]
        max_price: [1.0, 1.5, 3.0]
        include:
          - gpu: "RTX 4090"
            max_price: 1.0
          - gpu: "RTX 3090"
            max_price: 1.5
          - gpu: "A100"
            max_price: 3.0

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          pip install clore-ai
          pip install -r requirements.txt

      - name: Run tests on ${{ matrix.gpu }}
        run: |
          python ci/run_gpu_test.py \
            --gpu "${{ matrix.gpu }}" \
            --max-price ${{ matrix.max_price }} \
            --script "pytest tests/gpu/ -v"
```

Supporting script `ci/run_gpu_test.py`:

```python
#!/usr/bin/env python3
"""Run a test script on a rented Clore GPU."""

import argparse
import subprocess
import sys
import time

from clore_ai import CloreAI
from clore_ai.exceptions import CloreAPIError


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--gpu", required=True)
    parser.add_argument("--max-price", type=float, default=1.0)
    parser.add_argument("--script", required=True)
    parser.add_argument("--image", default="cloreai/pytorch")
    parser.add_argument("--timeout", type=int, default=600)
    args = parser.parse_args()

    client = CloreAI()
    order_id = None

    try:
        # Find server
        servers = client.marketplace(gpu=args.gpu, max_price_usd=args.max_price)
        if not servers:
            print(f"::warning::No {args.gpu} servers available under ${args.max_price}")
            sys.exit(0)

        servers.sort(key=lambda s: s.price_usd or float("inf"))
        best = servers[0]
        print(f"Server {best.id}: {best.gpu_count}x {best.gpu_model} @ ${best.price_usd:.4f}/h")

        # Create order
        order = client.create_order(
            server_id=best.id,
            image=args.image,
            type="on-demand",
            currency="bitcoin",
            ssh_password="CIMatrix123",
            ports={"22": "tcp"},
        )
        order_id = order.id
        print(f"Order {order_id} created, waiting for SSH...")

        # Wait for SSH
        time.sleep(30)
        orders = client.my_orders()
        active = next((o for o in orders if o.id == order_id), None)

        if not active or not active.pub_cluster:
            print("::error::Instance did not start")
            sys.exit(1)

        host = active.pub_cluster
        port = 22
        if active.tcp_ports and "22" in active.tcp_ports:
            port = active.tcp_ports["22"]

        # Run the test script
        ssh_cmd = [
            "ssh", "-o", "StrictHostKeyChecking=no",
            "-p", str(port), f"root@{host}",
            args.script,
        ]
        result = subprocess.run(ssh_cmd, timeout=args.timeout)
        sys.exit(result.returncode)

    except CloreAPIError as e:
        print(f"::error::API error: {e}")
        sys.exit(1)
    finally:
        if order_id:
            try:
                client.cancel_order(order_id, issue="CI complete")
            except Exception:
                pass


if __name__ == "__main__":
    main()
```

***

## GitLab CI

### Basic Pipeline

```yaml
# .gitlab-ci.yml
stages:
  - gpu-test

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.pip-cache"

gpu-smoke-test:
  stage: gpu-test
  image: python:3.11-slim
  timeout: 15 minutes

  before_script:
    - pip install clore-ai

  script:
    - python ci/run_gpu_test.py --gpu "RTX 4090" --max-price 1.0 --script "nvidia-smi"

  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

  variables:
    CLORE_API_KEY: $CLORE_API_KEY
```

### Parallel GPU Jobs

```yaml
# .gitlab-ci.yml
stages:
  - gpu-test

.gpu-test-template: &gpu-test
  stage: gpu-test
  image: python:3.11-slim
  timeout: 20 minutes
  before_script:
    - pip install clore-ai
    - pip install -r requirements.txt
  variables:
    CLORE_API_KEY: $CLORE_API_KEY

gpu-test-4090:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "RTX 4090" --max-price 1.0 --script "pytest tests/gpu/"

gpu-test-3090:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "RTX 3090" --max-price 1.5 --script "pytest tests/gpu/"
  allow_failure: true

gpu-test-a100:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "A100" --max-price 3.0 --script "pytest tests/gpu/"
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
```

***

## Docker

### SDK Script Container

Package your SDK automation scripts in a Docker image.

```dockerfile
# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install SDK
RUN pip install --no-cache-dir clore-ai

# Install SSH client (for remote execution)
RUN apt-get update && apt-get install -y --no-install-recommends openssh-client \
    && rm -rf /var/lib/apt/lists/*

# Copy your scripts
COPY scripts/ ./scripts/

# Default entrypoint
ENTRYPOINT ["python"]
CMD ["scripts/main.py"]
```

### Docker Compose for Local Development

```yaml
# docker-compose.yml
version: "3.8"

services:
  gpu-manager:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    volumes:
      - ./scripts:/app/scripts
      - ./results:/app/results
    command: python scripts/training_pipeline.py

  spot-bot:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    command: python scripts/spot_bidder.py
    restart: unless-stopped

  health-checker:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    command: python scripts/health_checker.py
    restart: unless-stopped
```

Run:

```bash
# Set your API key
echo "CLORE_API_KEY=your_key" > .env

# Start all services
docker compose up -d

# View logs
docker compose logs -f gpu-manager
```

### Multi-Stage Build for Production

```dockerfile
# Dockerfile.prod
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --target=/deps clore-ai -r requirements.txt

FROM python:3.11-slim
WORKDIR /app

# Copy only the installed packages
COPY --from=builder /deps /usr/local/lib/python3.11/site-packages/

# Install runtime deps only
RUN apt-get update && apt-get install -y --no-install-recommends openssh-client \
    && rm -rf /var/lib/apt/lists/*

# Non-root user
RUN useradd -m appuser
USER appuser

COPY scripts/ ./scripts/

ENTRYPOINT ["python"]
```

***

## Cleanup & Safety

### Always Cancel Orders in CI

Every CI job must cancel its orders in a `finally` block or a post-job step:

```yaml
# GitHub Actions — post-run cleanup
- name: Cleanup GPU orders
  if: always()
  run: |
    python << 'EOF'
    from clore_ai import CloreAI
    from clore_ai.exceptions import CloreAPIError

    client = CloreAI()
    try:
        orders = client.my_orders()
        for o in orders:
            client.cancel_order(o.id, issue="CI cleanup")
            print(f"Cancelled order {o.id}")
    except CloreAPIError as e:
        print(f"Cleanup error: {e}")
    EOF
```

### Budget Guard for CI

Prevent runaway CI costs:

```python
# ci/budget_guard.py
"""Check budget before allowing GPU operations."""

from clore_ai import CloreAI

MAX_ACTIVE_ORDERS = 3
MAX_HOURLY_SPEND = 5.0  # USD


def check_budget() -> bool:
    client = CloreAI()
    orders = client.my_orders()

    if len(orders) >= MAX_ACTIVE_ORDERS:
        print(f"::error::Too many active orders ({len(orders)}/{MAX_ACTIVE_ORDERS})")
        return False

    # Estimate hourly spend
    total_hourly = sum(o.price or 0 for o in orders)
    if total_hourly >= MAX_HOURLY_SPEND:
        print(f"::error::Hourly spend too high (${total_hourly:.2f}/${MAX_HOURLY_SPEND:.2f})")
        return False

    print(f"✅ Budget OK: {len(orders)} orders, ${total_hourly:.2f}/h")
    return True


if __name__ == "__main__":
    import sys
    sys.exit(0 if check_budget() else 1)
```

Use it as a pre-step:

```yaml
- name: Budget check
  run: python ci/budget_guard.py
```

***

## See Also

* [SDK API Reference](https://docs.clore.ai/dev/reference/python-sdk) — complete method documentation
* [SDK Quick Start](https://docs.clore.ai/dev/getting-started/python-sdk-quickstart) — getting started tutorial
* [Automation Recipes](https://docs.clore.ai/dev/advanced-use-cases/sdk-automation-recipes) — auto-scaler, spot bot, training pipeline
* [Auto-Provisioning from GitHub Actions](https://docs.clore.ai/dev/devops-and-automation/github-actions) — existing GitHub Actions guide
