# YOLOv9/v10 Detection > **State-of-the-art real-time object detection — train and deploy the latest YOLO models on GPU** YOLO (You Only Look Once) remains the gold standard for real-time object detection. YOLOv9 introduced Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN), while YOLOv10 brought NMS-free detection with dual-label assignments. Both deliver top-tier accuracy/speed tradeoffs on NVIDIA GPUs. * **YOLOv9 GitHub:** [WongKinYiu/yolov9](https://github.com/WongKinYiu/yolov9) — 8K+ ⭐ * **YOLOv10 GitHub:** [THU-MIG/yolov10](https://github.com/THU-MIG/yolov10) — 10K+ ⭐ * **Ultralytics (unified):** [ultralytics/ultralytics](https://github.com/ultralytics/ultralytics) — 32K+ ⭐ *** ## YOLOv9 vs YOLOv10 vs YOLOv8 — Quick Comparison | Model | mAP50-95 | Speed (A100) | Parameters | NMS | | -------- | -------- | ------------ | ---------- | -------- | | YOLOv8x | 53.9 | 14.2ms | 68.2M | Required | | YOLOv9e | 55.6 | 16.8ms | 57.3M | Required | | YOLOv10x | 54.4 | 10.7ms | 29.5M | **Free** | | YOLOv10b | 53.0 | 8.8ms | 19.1M | **Free** | | YOLOv10s | 46.8 | 4.2ms | 7.2M | **Free** | {% hint style="success" %} **YOLOv10 is NMS-free** — no post-processing Non-Maximum Suppression step. This enables end-to-end deployment and is particularly beneficial for edge/embedded scenarios and TensorRT deployment. {% endhint %} *** ## Use Cases * **Security & surveillance** — real-time person/vehicle/object detection * **Autonomous vehicles** — pedestrian and obstacle detection * **Manufacturing QC** — defect detection on production lines * **Retail analytics** — customer flow and product detection * **Medical imaging** — anomaly detection in X-rays and scans * **Sports analytics** — player and ball tracking * **Agriculture** — crop disease and pest detection *** ## Prerequisites * Clore.ai account with GPU rental * Training data (for custom model training) or use COCO pretrained weights * Basic Python and command line knowledge *** ## Step 1 — Rent a GPU on Clore.ai 1. Go to [clore.ai](https://clore.ai) → **Marketplace** 2. Choose GPU based on your task: * **Inference only:** RTX 3080/3090 or RTX 4080 — excellent price/performance * **Training small models:** RTX 4090 24GB * **Training large models (YOLOv9e/YOLOv10x):** A100 40/80GB {% hint style="info" %} **For real-time inference** (video streams), RTX 3090 or RTX 4090 delivers 100–500 FPS depending on the model variant. Even the smallest YOLOv10n runs at 1000+ FPS on a 4090 with TensorRT. {% endhint %} *** ## Step 2 — Deploy the Ultralytics Container The official Ultralytics Docker image supports YOLOv8, YOLOv9, and YOLOv10 through a unified API: **Docker Image:** ``` ultralytics/ultralytics:latest ``` **Ports:** ``` 22 8000 ``` **Environment Variables:** ``` NVIDIA_VISIBLE_DEVICES=all NVIDIA_DRIVER_CAPABILITIES=compute,utility ``` **Disk:** 20GB minimum (pretrained weights + your dataset) *** ## Step 3 — Connect and Verify ```bash ssh root@ -p # Check GPU nvidia-smi # Check Ultralytics installation python3 -c "import ultralytics; ultralytics.checks()" # Should show GPU info, CUDA version, and model availability ``` *** ## Step 4 — Quick Inference with Pretrained Models ### YOLOv10 Inference (NMS-free) ```python from ultralytics import YOLO import cv2 # Load YOLOv10 model (auto-downloads if not present) model = YOLO("yolov10x.pt") # Options: n, s, m, b, l, x # Run inference on an image results = model("https://ultralytics.com/images/bus.jpg") # Display results for result in results: boxes = result.boxes print(f"Detected {len(boxes)} objects") for box in boxes: cls = int(box.cls[0]) conf = float(box.conf[0]) xyxy = box.xyxy[0].tolist() print(f" {model.names[cls]}: {conf:.2f} at {[int(x) for x in xyxy]}") # Save annotated image results[0].save("output.jpg") ``` ### YOLOv9 Inference ```python from ultralytics import YOLO # Load YOLOv9 model model = YOLO("yolov9e.pt") # Options: t, s, m, c, e # Batch inference for maximum throughput results = model( source=[ "image1.jpg", "image2.jpg", "image3.jpg", ], batch=8, # Process 8 images in parallel device="cuda", conf=0.25, # Confidence threshold iou=0.45, # NMS IoU threshold (not needed for v10) imgsz=640, half=True # FP16 for 2x speedup ) ``` ### Real-Time Video Stream Inference ```python from ultralytics import YOLO import cv2 model = YOLO("yolov10s.pt") # For webcam (device=0) or video file cap = cv2.VideoCapture("input_video.mp4") # Get video properties fps = cap.get(cv2.CAP_PROP_FPS) width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) # Output writer out = cv2.VideoWriter( "output_video.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (width, height) ) frame_count = 0 while cap.isOpened(): ret, frame = cap.read() if not ret: break results = model(frame, conf=0.25, verbose=False) annotated = results[0].plot() out.write(annotated) frame_count += 1 if frame_count % 100 == 0: print(f"Processed {frame_count} frames") cap.release() out.release() print("Done! Output saved to output_video.mp4") ``` *** ## Step 5 — Train a Custom Model ### Prepare Your Dataset YOLO uses a specific directory structure and label format: ``` dataset/ ├── images/ │ ├── train/ # Training images (.jpg/.png) │ ├── val/ # Validation images │ └── test/ # Test images (optional) └── labels/ ├── train/ # Label files (.txt) ├── val/ └── test/ ``` Each label file (same name as image, `.txt` extension) contains: ``` # class_id center_x center_y width height (all normalized 0-1) 0 0.512 0.334 0.256 0.412 1 0.123 0.654 0.089 0.123 ``` ### Create Dataset Config ```bash cat > /workspace/custom_dataset.yaml << 'EOF' # Dataset configuration path: /workspace/dataset train: images/train val: images/val test: images/test # Number of classes nc: 3 # Class names names: 0: person 1: car 2: bicycle EOF ``` ### Import from Roboflow (Recommended) ```python # Install Roboflow pip install roboflow from roboflow import Roboflow rf = Roboflow(api_key="YOUR_API_KEY") project = rf.workspace("your-workspace").project("your-project") version = project.version(1) dataset = version.download("yolov9") # Dataset is now at ./your-project-1/ ``` ### Train YOLOv10 ```python from ultralytics import YOLO # Load pretrained YOLOv10 model (transfer learning) model = YOLO("yolov10m.pt") # Medium variant — good balance results = model.train( data="/workspace/custom_dataset.yaml", epochs=100, imgsz=640, batch=16, # Adjust for your GPU VRAM device="cuda", workers=8, project="/workspace/runs", name="yolov10_custom", patience=50, # Early stopping save=True, save_period=10, # Save checkpoint every 10 epochs plots=True, val=True, augment=True, # Data augmentation degrees=10.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.1, copy_paste=0.1, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, amp=True # Automatic Mixed Precision (FP16) ) print(f"Training complete! Best mAP: {results.results_dict['metrics/mAP50-95(B)']:.3f}") ``` ### Train YOLOv9 ```python from ultralytics import YOLO model = YOLO("yolov9e.pt") results = model.train( data="/workspace/custom_dataset.yaml", epochs=100, imgsz=640, batch=8, # v9e is larger, needs less batch device="cuda", workers=8, project="/workspace/runs", name="yolov9_custom", amp=True, optimizer="SGD", momentum=0.937, weight_decay=0.0005 ) ``` {% hint style="info" %} **Training Tips:** * **Batch size:** Start with `batch=16` for RTX 4090, `batch=32` for A100 40GB * **Image size:** `imgsz=640` is standard; use 1280 for high-resolution tasks * **Epochs:** 100 epochs is typical for fine-tuning, 300+ for training from scratch * **AMP (Mixed Precision):** Always enable `amp=True` for 1.5–2x speedup {% endhint %} *** ## Step 6 — Export to TensorRT for Maximum Speed ```python from ultralytics import YOLO # Load trained model model = YOLO("/workspace/runs/yolov10_custom/weights/best.pt") # Export to TensorRT (FP16 for best speed/accuracy balance) model.export( format="engine", # TensorRT engine device="cuda", half=True, # FP16 dynamic=False, # Static shapes for max TRT optimization batch=1, # Optimize for batch size 1 (real-time) imgsz=640, workspace=4 # TRT workspace in GB ) # Saved as: best.engine # Load and run TRT engine trt_model = YOLO("best.engine") results = trt_model("image.jpg") ``` ### Export to ONNX ```python # Export to ONNX for deployment flexibility model.export( format="onnx", opset=17, half=True, # FP16 weights dynamic=True, # Dynamic batch size simplify=True ) ``` *** ## Step 7 — Serve as a REST API ```bash pip install fastapi uvicorn python-multipart cat > /workspace/yolo_api.py << 'EOF' from fastapi import FastAPI, File, UploadFile from fastapi.responses import JSONResponse, FileResponse from ultralytics import YOLO from PIL import Image import io import uuid import os app = FastAPI(title="YOLOv10 Detection API") model = YOLO("yolov10x.pt") @app.get("/health") async def health(): return {"status": "ok", "model": "yolov10x", "device": "cuda"} @app.post("/detect") async def detect( file: UploadFile = File(...), conf: float = 0.25, iou: float = 0.45, return_image: bool = False ): # Read uploaded image image_data = await file.read() img = Image.open(io.BytesIO(image_data)).convert("RGB") # Run detection results = model(img, conf=conf, iou=iou, verbose=False) result = results[0] # Build response detections = [] for box in result.boxes: detections.append({ "class": model.names[int(box.cls[0])], "confidence": round(float(box.conf[0]), 4), "bbox": [round(x, 2) for x in box.xyxy[0].tolist()], "class_id": int(box.cls[0]) }) response = { "count": len(detections), "detections": detections, "image_size": list(result.orig_shape) } if return_image: output_path = f"/tmp/{uuid.uuid4()}.jpg" result.save(filename=output_path) return FileResponse(output_path, media_type="image/jpeg") return JSONResponse(response) @app.post("/detect/batch") async def detect_batch(files: list[UploadFile] = File(...)): results = [] for file in files: data = await file.read() img = Image.open(io.BytesIO(data)).convert("RGB") res = model(img, verbose=False)[0] results.append({ "filename": file.filename, "count": len(res.boxes), "detections": [ {"class": model.names[int(b.cls[0])], "conf": float(b.conf[0])} for b in res.boxes ] }) return JSONResponse({"results": results}) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) EOF python3 /workspace/yolo_api.py & # Test the API curl -X POST "http://localhost:8000/detect" \ -F "file=@test_image.jpg" | python3 -m json.tool ``` *** ## Step 8 — Validate and Benchmark Your Model ```python from ultralytics import YOLO model = YOLO("yolov10x.pt") # Validate on COCO dataset metrics = model.val( data="coco.yaml", imgsz=640, batch=32, device="cuda", half=True ) print(f"mAP50: {metrics.box.map50:.3f}") print(f"mAP50-95: {metrics.box.map:.3f}") print(f"Precision: {metrics.box.mp:.3f}") print(f"Recall: {metrics.box.mr:.3f}") # Benchmark speed model.benchmark( format="engine", # Compare multiple export formats imgsz=640, half=True, device="cuda" ) ``` *** ## Download Results ```bash # From your local machine: scp -P root@:/workspace/runs/yolov10_custom/weights/best.pt ./ scp -P root@:/workspace/output_video.mp4 ./ # Download entire training run rsync -avz -e "ssh -p " \ root@:/workspace/runs/ \ ./yolo_training_runs/ ``` *** ## Troubleshooting ### CUDA Out of Memory During Training ```python # Reduce batch size model.train(data="data.yaml", batch=4, imgsz=640) # Or enable gradient checkpointing model.train(data="data.yaml", batch=8, imgsz=640, cache=False) ``` ### Slow Training Speed ```python # Enable caching (loads dataset into RAM/GPU) model.train(data="data.yaml", cache=True) # Cache to RAM model.train(data="data.yaml", cache="disk") # Cache to disk # Increase workers (careful: too many can slow down) model.train(data="data.yaml", workers=8) ``` ### Low mAP / Poor Detection ```bash # Verify labels are correct (normalized, within 0-1) python3 -c " from ultralytics.data.utils import check_det_dataset check_det_dataset('custom_dataset.yaml') " # Visualize training samples python3 -c " from ultralytics import YOLO model = YOLO('yolov10m.pt') model.train(data='data.yaml', epochs=1, batch=4, plots=True) # Check /workspace/runs/train/exp/train_batch*.jpg " ``` *** ## Performance Reference (Clore.ai GPUs) | Model | GPU | Batch | FPS (inference) | mAP50-95 | | ------------ | -------- | ----- | --------------- | -------- | | YOLOv10n | RTX 3090 | 1 | 1,200 | 38.5 | | YOLOv10s | RTX 3090 | 1 | 780 | 46.8 | | YOLOv10m | RTX 4090 | 1 | 950 | 51.3 | | YOLOv10x | RTX 4090 | 1 | 380 | 54.4 | | YOLOv9e | A100 40G | 1 | 720 | 55.6 | | YOLOv10x TRT | RTX 4090 | 1 | 920 | 54.2 | *** ## Additional Resources * [Ultralytics Documentation](https://docs.ultralytics.com/) * [YOLOv9 Paper](https://arxiv.org/abs/2402.13616) * [YOLOv10 Paper](https://arxiv.org/abs/2405.14458) * [Roboflow Universe](https://universe.roboflow.com/) — 100K+ public datasets * [Ultralytics HUB](https://hub.ultralytics.com/) — Cloud training platform * [COCO Dataset](https://cocodataset.org/) — Standard benchmark dataset *** *YOLOv9 and YOLOv10 on Clore.ai GPU rentals provide an affordable path to training custom object detection models and deploying real-time inference pipelines — without the overhead of AWS SageMaker or Google Vertex AI.* *** ## Clore.ai GPU Recommendations | Use Case | Recommended GPU | Est. Cost on Clore.ai | | -------------------- | --------------- | --------------------- | | Development/Testing | RTX 3090 (24GB) | \~$0.12/gpu/hr | | Production Inference | RTX 4090 (24GB) | \~$0.70/gpu/hr | | Large-batch Training | A100 80GB | \~$1.20/gpu/hr | > 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access. --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.clore.ai/guides/computer-vision/yolov9-v10.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.