CI/CD with clore-ai SDK

Integrate GPU testing and deployment into your CI/CD pipelines. This chapter covers GitHub Actions, GitLab CI, Docker, and secrets management — with full working configs.

Secrets Management

Before configuring any pipeline, store your Clore API key securely.

GitHub Actions

Go to your repo → Settings → Secrets and variables → Actions
Click New repository secret
Name: CLORE_API_KEY, Value: your API key

GitLab CI

Go to your project → Settings → CI/CD → Variables
Add variable: Key = CLORE_API_KEY, Value = your API key
Check Mask variable and Protect variable

General Rules

Never hardcode API keys in source code or CI configs
Use environment variables or secrets managers
Rotate keys periodically
Restrict key scope: use a dedicated API key for CI (not your main account key)

GitHub Actions

Basic: GPU Smoke Test

Run nvidia-smi on a Clore GPU on every push to main.

# .github/workflows/gpu-test.yml
name: GPU Smoke Test

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  CLORE_API_KEY: ${{ secrets.CLORE_API_KEY }}

jobs:
  gpu-test:
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install SDK
        run: pip install clore-ai

      - name: Run GPU test
        run: |
          python << 'EOF'
          import time
          from clore_ai import CloreAI
          from clore_ai.exceptions import CloreAPIError

          client = CloreAI()
          order_id = None

          try:
              # Find cheapest GPU
              servers = client.marketplace(max_price_usd=1.0)
              servers.sort(key=lambda s: s.price_usd or float("inf"))

              if not servers:
                  print("::warning::No GPU servers available")
                  exit(0)

              best = servers[0]
              print(f"Using server {best.id}: {best.gpu_model} @ ${best.price_usd:.4f}/h")

              # Create order
              order = client.create_order(
                  server_id=best.id,
                  image="cloreai/ubuntu22.04-cuda12",
                  type="on-demand",
                  currency="bitcoin",
                  ssh_password="CITest123",
                  ports={"22": "tcp"},
              )
              order_id = order.id
              print(f"Order {order_id} created")

              # Wait for instance (poll for IP)
              for _ in range(24):  # 2 minutes
                  time.sleep(5)
                  orders = client.my_orders()
                  active = next((o for o in orders if o.id == order_id), None)
                  if active and active.pub_cluster:
                      print(f"Instance ready: {active.pub_cluster}")
                      break
              else:
                  print("::error::Instance did not start in time")
                  exit(1)

              print("✅ GPU test passed")

          except CloreAPIError as e:
              print(f"::error::Clore API error: {e}")
              exit(1)

          finally:
              if order_id:
                  try:
                      client.cancel_order(order_id, issue="CI test complete")
                      print(f"Order {order_id} cancelled")
                  except Exception:
                      pass
          EOF

Advanced: Matrix GPU Testing

Test your code on multiple GPU types in parallel.

# .github/workflows/gpu-matrix.yml
name: GPU Matrix Test

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  CLORE_API_KEY: ${{ secrets.CLORE_API_KEY }}

jobs:
  gpu-test:
    runs-on: ubuntu-latest
    timeout-minutes: 20

    strategy:
      fail-fast: false
      matrix:
        gpu: ["RTX 4090", "RTX 3090", "A100"]
        max_price: [1.0, 1.5, 3.0]
        include:
          - gpu: "RTX 4090"
            max_price: 1.0
          - gpu: "RTX 3090"
            max_price: 1.5
          - gpu: "A100"
            max_price: 3.0

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          pip install clore-ai
          pip install -r requirements.txt

      - name: Run tests on ${{ matrix.gpu }}
        run: |
          python ci/run_gpu_test.py \
            --gpu "${{ matrix.gpu }}" \
            --max-price ${{ matrix.max_price }} \
            --script "pytest tests/gpu/ -v"

Supporting script ci/run_gpu_test.py:

#!/usr/bin/env python3
"""Run a test script on a rented Clore GPU."""

import argparse
import subprocess
import sys
import time

from clore_ai import CloreAI
from clore_ai.exceptions import CloreAPIError


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--gpu", required=True)
    parser.add_argument("--max-price", type=float, default=1.0)
    parser.add_argument("--script", required=True)
    parser.add_argument("--image", default="cloreai/pytorch")
    parser.add_argument("--timeout", type=int, default=600)
    args = parser.parse_args()

    client = CloreAI()
    order_id = None

    try:
        # Find server
        servers = client.marketplace(gpu=args.gpu, max_price_usd=args.max_price)
        if not servers:
            print(f"::warning::No {args.gpu} servers available under ${args.max_price}")
            sys.exit(0)

        servers.sort(key=lambda s: s.price_usd or float("inf"))
        best = servers[0]
        print(f"Server {best.id}: {best.gpu_count}x {best.gpu_model} @ ${best.price_usd:.4f}/h")

        # Create order
        order = client.create_order(
            server_id=best.id,
            image=args.image,
            type="on-demand",
            currency="bitcoin",
            ssh_password="CIMatrix123",
            ports={"22": "tcp"},
        )
        order_id = order.id
        print(f"Order {order_id} created, waiting for SSH...")

        # Wait for SSH
        time.sleep(30)
        orders = client.my_orders()
        active = next((o for o in orders if o.id == order_id), None)

        if not active or not active.pub_cluster:
            print("::error::Instance did not start")
            sys.exit(1)

        host = active.pub_cluster
        port = 22
        if active.tcp_ports and "22" in active.tcp_ports:
            port = active.tcp_ports["22"]

        # Run the test script
        ssh_cmd = [
            "ssh", "-o", "StrictHostKeyChecking=no",
            "-p", str(port), f"root@{host}",
            args.script,
        ]
        result = subprocess.run(ssh_cmd, timeout=args.timeout)
        sys.exit(result.returncode)

    except CloreAPIError as e:
        print(f"::error::API error: {e}")
        sys.exit(1)
    finally:
        if order_id:
            try:
                client.cancel_order(order_id, issue="CI complete")
            except Exception:
                pass


if __name__ == "__main__":
    main()

GitLab CI

Basic Pipeline

# .gitlab-ci.yml
stages:
  - gpu-test

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.pip-cache"

gpu-smoke-test:
  stage: gpu-test
  image: python:3.11-slim
  timeout: 15 minutes

  before_script:
    - pip install clore-ai

  script:
    - python ci/run_gpu_test.py --gpu "RTX 4090" --max-price 1.0 --script "nvidia-smi"

  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == "main"

  variables:
    CLORE_API_KEY: $CLORE_API_KEY

Parallel GPU Jobs

# .gitlab-ci.yml
stages:
  - gpu-test

.gpu-test-template: &gpu-test
  stage: gpu-test
  image: python:3.11-slim
  timeout: 20 minutes
  before_script:
    - pip install clore-ai
    - pip install -r requirements.txt
  variables:
    CLORE_API_KEY: $CLORE_API_KEY

gpu-test-4090:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "RTX 4090" --max-price 1.0 --script "pytest tests/gpu/"

gpu-test-3090:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "RTX 3090" --max-price 1.5 --script "pytest tests/gpu/"
  allow_failure: true

gpu-test-a100:
  <<: *gpu-test
  script:
    - python ci/run_gpu_test.py --gpu "A100" --max-price 3.0 --script "pytest tests/gpu/"
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

Docker

SDK Script Container

Package your SDK automation scripts in a Docker image.

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install SDK
RUN pip install --no-cache-dir clore-ai

# Install SSH client (for remote execution)
RUN apt-get update && apt-get install -y --no-install-recommends openssh-client \
    && rm -rf /var/lib/apt/lists/*

# Copy your scripts
COPY scripts/ ./scripts/

# Default entrypoint
ENTRYPOINT ["python"]
CMD ["scripts/main.py"]

Docker Compose for Local Development

# docker-compose.yml
version: "3.8"

services:
  gpu-manager:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    volumes:
      - ./scripts:/app/scripts
      - ./results:/app/results
    command: python scripts/training_pipeline.py

  spot-bot:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    command: python scripts/spot_bidder.py
    restart: unless-stopped

  health-checker:
    build: .
    environment:
      - CLORE_API_KEY=${CLORE_API_KEY}
    command: python scripts/health_checker.py
    restart: unless-stopped

Run:

# Set your API key
echo "CLORE_API_KEY=your_key" > .env

# Start all services
docker compose up -d

# View logs
docker compose logs -f gpu-manager

Multi-Stage Build for Production

# Dockerfile.prod
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --target=/deps clore-ai -r requirements.txt

FROM python:3.11-slim
WORKDIR /app

# Copy only the installed packages
COPY --from=builder /deps /usr/local/lib/python3.11/site-packages/

# Install runtime deps only
RUN apt-get update && apt-get install -y --no-install-recommends openssh-client \
    && rm -rf /var/lib/apt/lists/*

# Non-root user
RUN useradd -m appuser
USER appuser

COPY scripts/ ./scripts/

ENTRYPOINT ["python"]

Cleanup & Safety

Always Cancel Orders in CI

Every CI job must cancel its orders in a finally block or a post-job step:

# GitHub Actions — post-run cleanup
- name: Cleanup GPU orders
  if: always()
  run: |
    python << 'EOF'
    from clore_ai import CloreAI
    from clore_ai.exceptions import CloreAPIError

    client = CloreAI()
    try:
        orders = client.my_orders()
        for o in orders:
            client.cancel_order(o.id, issue="CI cleanup")
            print(f"Cancelled order {o.id}")
    except CloreAPIError as e:
        print(f"Cleanup error: {e}")
    EOF

Budget Guard for CI

Prevent runaway CI costs:

# ci/budget_guard.py
"""Check budget before allowing GPU operations."""

from clore_ai import CloreAI

MAX_ACTIVE_ORDERS = 3
MAX_HOURLY_SPEND = 5.0  # USD


def check_budget() -> bool:
    client = CloreAI()
    orders = client.my_orders()

    if len(orders) >= MAX_ACTIVE_ORDERS:
        print(f"::error::Too many active orders ({len(orders)}/{MAX_ACTIVE_ORDERS})")
        return False

    # Estimate hourly spend
    total_hourly = sum(o.price or 0 for o in orders)
    if total_hourly >= MAX_HOURLY_SPEND:
        print(f"::error::Hourly spend too high (${total_hourly:.2f}/${MAX_HOURLY_SPEND:.2f})")
        return False

    print(f"✅ Budget OK: {len(orders)} orders, ${total_hourly:.2f}/h")
    return True


if __name__ == "__main__":
    import sys
    sys.exit(0 if check_budget() else 1)

Use it as a pre-step:

- name: Budget check
  run: python ci/budget_guard.py

hashtagSecrets Management

hashtagGitHub Actions

hashtagGitLab CI

hashtagGeneral Rules

hashtagGitHub Actions

hashtagBasic: GPU Smoke Test

hashtagAdvanced: Matrix GPU Testing

hashtagGitLab CI

hashtagBasic Pipeline

hashtagParallel GPU Jobs

hashtagDocker

hashtagSDK Script Container

hashtagDocker Compose for Local Development

hashtagMulti-Stage Build for Production

hashtagCleanup & Safety

hashtagAlways Cancel Orders in CI

hashtagBudget Guard for CI

hashtagSee Also

Secrets Management

GitHub Actions

GitLab CI

General Rules

GitHub Actions

Basic: GPU Smoke Test

Advanced: Matrix GPU Testing

GitLab CI

Basic Pipeline

Parallel GPU Jobs

Docker

SDK Script Container

Docker Compose for Local Development

Multi-Stage Build for Production

Cleanup & Safety

Always Cancel Orders in CI

Budget Guard for CI

See Also