YOLOv9/v10 Detection

State-of-the-art real-time object detection — train and deploy the latest YOLO models on GPU

YOLO (You Only Look Once) remains the gold standard for real-time object detection. YOLOv9 introduced Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN), while YOLOv10 brought NMS-free detection with dual-label assignments. Both deliver top-tier accuracy/speed tradeoffs on NVIDIA GPUs.


YOLOv9 vs YOLOv10 vs YOLOv8 — Quick Comparison

Model
mAP50-95
Speed (A100)
Parameters
NMS

YOLOv8x

53.9

14.2ms

68.2M

Required

YOLOv9e

55.6

16.8ms

57.3M

Required

YOLOv10x

54.4

10.7ms

29.5M

Free

YOLOv10b

53.0

8.8ms

19.1M

Free

YOLOv10s

46.8

4.2ms

7.2M

Free

circle-check

Use Cases

  • Security & surveillance — real-time person/vehicle/object detection

  • Autonomous vehicles — pedestrian and obstacle detection

  • Manufacturing QC — defect detection on production lines

  • Retail analytics — customer flow and product detection

  • Medical imaging — anomaly detection in X-rays and scans

  • Sports analytics — player and ball tracking

  • Agriculture — crop disease and pest detection


Prerequisites

  • Clore.ai account with GPU rental

  • Training data (for custom model training) or use COCO pretrained weights

  • Basic Python and command line knowledge


Step 1 — Rent a GPU on Clore.ai

  1. Go to clore.aiarrow-up-rightMarketplace

  2. Choose GPU based on your task:

    • Inference only: RTX 3080/3090 or RTX 4080 — excellent price/performance

    • Training small models: RTX 4090 24GB

    • Training large models (YOLOv9e/YOLOv10x): A100 40/80GB

circle-info

For real-time inference (video streams), RTX 3090 or RTX 4090 delivers 100–500 FPS depending on the model variant. Even the smallest YOLOv10n runs at 1000+ FPS on a 4090 with TensorRT.


Step 2 — Deploy the Ultralytics Container

The official Ultralytics Docker image supports YOLOv8, YOLOv9, and YOLOv10 through a unified API:

Docker Image:

Ports:

Environment Variables:

Disk: 20GB minimum (pretrained weights + your dataset)


Step 3 — Connect and Verify


Step 4 — Quick Inference with Pretrained Models

YOLOv10 Inference (NMS-free)

YOLOv9 Inference

Real-Time Video Stream Inference


Step 5 — Train a Custom Model

Prepare Your Dataset

YOLO uses a specific directory structure and label format:

Each label file (same name as image, .txt extension) contains:

Create Dataset Config

Train YOLOv10

Train YOLOv9

circle-info

Training Tips:

  • Batch size: Start with batch=16 for RTX 4090, batch=32 for A100 40GB

  • Image size: imgsz=640 is standard; use 1280 for high-resolution tasks

  • Epochs: 100 epochs is typical for fine-tuning, 300+ for training from scratch

  • AMP (Mixed Precision): Always enable amp=True for 1.5–2x speedup


Step 6 — Export to TensorRT for Maximum Speed

Export to ONNX


Step 7 — Serve as a REST API


Step 8 — Validate and Benchmark Your Model


Download Results


Troubleshooting

CUDA Out of Memory During Training

Slow Training Speed

Low mAP / Poor Detection


Performance Reference (Clore.ai GPUs)

Model
GPU
Batch
FPS (inference)
mAP50-95

YOLOv10n

RTX 3090

1

1,200

38.5

YOLOv10s

RTX 3090

1

780

46.8

YOLOv10m

RTX 4090

1

950

51.3

YOLOv10x

RTX 4090

1

380

54.4

YOLOv9e

A100 40G

1

720

55.6

YOLOv10x TRT

RTX 4090

1

920

54.2


Additional Resources


YOLOv9 and YOLOv10 on Clore.ai GPU rentals provide an affordable path to training custom object detection models and deploying real-time inference pipelines — without the overhead of AWS SageMaker or Google Vertex AI.


Clore.ai GPU Recommendations

Use Case
Recommended GPU
Est. Cost on Clore.ai

Development/Testing

RTX 3090 (24GB)

~$0.12/gpu/hr

Production Inference

RTX 4090 (24GB)

~$0.70/gpu/hr

Large-batch Training

A100 80GB

~$1.20/gpu/hr

💡 All examples in this guide can be deployed on Clore.aiarrow-up-right GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.

Last updated

Was this helpful?