Reinforcement Learning on Cloud GPUs

What We're Building

A complete reinforcement learning training pipeline using Stable-Baselines3 on Clore.ai GPUs. Train agents for games, robotics simulation, and custom environments with automatic GPU provisioning and experiment tracking.

Key Features:

  • Automatic GPU provisioning via Clore.ai API

  • Stable-Baselines3 with GPU acceleration

  • Support for PPO, SAC, DQN, A2C, and more

  • Weights & Biases integration for experiment tracking

  • Custom environment support

  • Checkpoint saving and model export

  • Multi-environment parallel training

Prerequisites

pip install requests paramiko scp stable-baselines3 gymnasium wandb

Architecture Overview

Step 1: Clore.ai RL Client

Step 2: Remote RL Trainer

Step 3: Complete RL Pipeline

Full Script: Production RL Training

Supported Environments

Environment
Type
Algorithm

CartPole-v1

Discrete

PPO, DQN, A2C

LunarLander-v2

Discrete

PPO, DQN

BipedalWalker-v3

Continuous

PPO, SAC, TD3

HalfCheetah-v4

Continuous

SAC, TD3, PPO

Pendulum-v1

Continuous

SAC, TD3

Cost Comparison

Task
CPU (Local)
AWS p3.2xlarge
Clore.ai RTX 4090

CartPole 100K

5 min

$0.25

$0.01

LunarLander 500K

30 min

$1.50

$0.08

Mujoco 1M

2 hours

$6.00

$0.30

Next Steps

Last updated

Was this helpful?