Auto-Scaling Inference Workers

What We're Building

An auto-scaling system that dynamically provisions and de-provisions Clore.ai GPU workers based on queue depth. Scale your inference capacity up when demand increases and down when it drops β€” paying only for what you use.

Key Features:

  • Queue-based scaling (Redis/SQS/RabbitMQ)

  • Configurable scaling thresholds

  • Min/max worker limits

  • Cool-down periods to prevent thrashing

  • Cost tracking and budgeting

  • Graceful worker shutdown

  • Health monitoring

Prerequisites

  • Clore.ai account with API key

  • Python 3.10+

  • Redis or other message queue

pip install requests redis

Architecture Overview

Full Script: Auto-Scaling GPU Workers

Example Worker Script

Scaling Behavior

Queue Depth
Workers (0-5 range)
Action

0-2

min

Scale down to min

3-9

1-2

Maintain

10-19

2-3

Scale up

20-49

4

Scale up

50+

5 (max)

At capacity

Cost Example

Queue Load
Workers
Hourly Cost
Daily Cost

Low (0-10)

1

$0.30

$7.20

Medium (10-30)

2-3

$0.60-0.90

$14-22

High (30+)

5

$1.50

$36.00

Next Steps

Last updated

Was this helpful?