# YOLOv9/v10 Detection

> **State-of-the-art real-time object detection — train and deploy the latest YOLO models on GPU**

YOLO (You Only Look Once) remains the gold standard for real-time object detection. YOLOv9 introduced Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN), while YOLOv10 brought NMS-free detection with dual-label assignments. Both deliver top-tier accuracy/speed tradeoffs on NVIDIA GPUs.

* **YOLOv9 GitHub:** [WongKinYiu/yolov9](https://github.com/WongKinYiu/yolov9) — 8K+ ⭐
* **YOLOv10 GitHub:** [THU-MIG/yolov10](https://github.com/THU-MIG/yolov10) — 10K+ ⭐
* **Ultralytics (unified):** [ultralytics/ultralytics](https://github.com/ultralytics/ultralytics) — 32K+ ⭐

***

## YOLOv9 vs YOLOv10 vs YOLOv8 — Quick Comparison

| Model    | mAP50-95 | Speed (A100) | Parameters | NMS      |
| -------- | -------- | ------------ | ---------- | -------- |
| YOLOv8x  | 53.9     | 14.2ms       | 68.2M      | Required |
| YOLOv9e  | 55.6     | 16.8ms       | 57.3M      | Required |
| YOLOv10x | 54.4     | 10.7ms       | 29.5M      | **Free** |
| YOLOv10b | 53.0     | 8.8ms        | 19.1M      | **Free** |
| YOLOv10s | 46.8     | 4.2ms        | 7.2M       | **Free** |

{% hint style="success" %}
**YOLOv10 is NMS-free** — no post-processing Non-Maximum Suppression step. This enables end-to-end deployment and is particularly beneficial for edge/embedded scenarios and TensorRT deployment.
{% endhint %}

***

## Use Cases

* **Security & surveillance** — real-time person/vehicle/object detection
* **Autonomous vehicles** — pedestrian and obstacle detection
* **Manufacturing QC** — defect detection on production lines
* **Retail analytics** — customer flow and product detection
* **Medical imaging** — anomaly detection in X-rays and scans
* **Sports analytics** — player and ball tracking
* **Agriculture** — crop disease and pest detection

***

## Prerequisites

* Clore.ai account with GPU rental
* Training data (for custom model training) or use COCO pretrained weights
* Basic Python and command line knowledge

***

## Step 1 — Rent a GPU on Clore.ai

1. Go to [clore.ai](https://clore.ai) → **Marketplace**
2. Choose GPU based on your task:
   * **Inference only:** RTX 3080/3090 or RTX 4080 — excellent price/performance
   * **Training small models:** RTX 4090 24GB
   * **Training large models (YOLOv9e/YOLOv10x):** A100 40/80GB

{% hint style="info" %}
**For real-time inference** (video streams), RTX 3090 or RTX 4090 delivers 100–500 FPS depending on the model variant. Even the smallest YOLOv10n runs at 1000+ FPS on a 4090 with TensorRT.
{% endhint %}

***

## Step 2 — Deploy the Ultralytics Container

The official Ultralytics Docker image supports YOLOv8, YOLOv9, and YOLOv10 through a unified API:

**Docker Image:**

```
ultralytics/ultralytics:latest
```

**Ports:**

```
22
8000
```

**Environment Variables:**

```
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
```

**Disk:** 20GB minimum (pretrained weights + your dataset)

***

## Step 3 — Connect and Verify

```bash
ssh root@<server-ip> -p <ssh-port>

# Check GPU
nvidia-smi

# Check Ultralytics installation
python3 -c "import ultralytics; ultralytics.checks()"

# Should show GPU info, CUDA version, and model availability
```

***

## Step 4 — Quick Inference with Pretrained Models

### YOLOv10 Inference (NMS-free)

```python
from ultralytics import YOLO
import cv2

# Load YOLOv10 model (auto-downloads if not present)
model = YOLO("yolov10x.pt")  # Options: n, s, m, b, l, x

# Run inference on an image
results = model("https://ultralytics.com/images/bus.jpg")

# Display results
for result in results:
    boxes = result.boxes
    print(f"Detected {len(boxes)} objects")
    for box in boxes:
        cls = int(box.cls[0])
        conf = float(box.conf[0])
        xyxy = box.xyxy[0].tolist()
        print(f"  {model.names[cls]}: {conf:.2f} at {[int(x) for x in xyxy]}")

# Save annotated image
results[0].save("output.jpg")
```

### YOLOv9 Inference

```python
from ultralytics import YOLO

# Load YOLOv9 model
model = YOLO("yolov9e.pt")  # Options: t, s, m, c, e

# Batch inference for maximum throughput
results = model(
    source=[
        "image1.jpg",
        "image2.jpg",
        "image3.jpg",
    ],
    batch=8,        # Process 8 images in parallel
    device="cuda",
    conf=0.25,      # Confidence threshold
    iou=0.45,       # NMS IoU threshold (not needed for v10)
    imgsz=640,
    half=True       # FP16 for 2x speedup
)
```

### Real-Time Video Stream Inference

```python
from ultralytics import YOLO
import cv2

model = YOLO("yolov10s.pt")

# For webcam (device=0) or video file
cap = cv2.VideoCapture("input_video.mp4")

# Get video properties
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Output writer
out = cv2.VideoWriter(
    "output_video.mp4",
    cv2.VideoWriter_fourcc(*"mp4v"),
    fps,
    (width, height)
)

frame_count = 0
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    
    results = model(frame, conf=0.25, verbose=False)
    annotated = results[0].plot()
    out.write(annotated)
    frame_count += 1
    
    if frame_count % 100 == 0:
        print(f"Processed {frame_count} frames")

cap.release()
out.release()
print("Done! Output saved to output_video.mp4")
```

***

## Step 5 — Train a Custom Model

### Prepare Your Dataset

YOLO uses a specific directory structure and label format:

```
dataset/
├── images/
│   ├── train/          # Training images (.jpg/.png)
│   ├── val/            # Validation images
│   └── test/           # Test images (optional)
└── labels/
    ├── train/          # Label files (.txt)
    ├── val/
    └── test/
```

Each label file (same name as image, `.txt` extension) contains:

```
# class_id center_x center_y width height (all normalized 0-1)
0 0.512 0.334 0.256 0.412
1 0.123 0.654 0.089 0.123
```

### Create Dataset Config

```bash
cat > /workspace/custom_dataset.yaml << 'EOF'
# Dataset configuration
path: /workspace/dataset
train: images/train
val: images/val
test: images/test

# Number of classes
nc: 3

# Class names
names:
  0: person
  1: car
  2: bicycle
EOF
```

### Import from Roboflow (Recommended)

```python
# Install Roboflow
pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("your-workspace").project("your-project")
version = project.version(1)
dataset = version.download("yolov9")

# Dataset is now at ./your-project-1/
```

### Train YOLOv10

```python
from ultralytics import YOLO

# Load pretrained YOLOv10 model (transfer learning)
model = YOLO("yolov10m.pt")  # Medium variant — good balance

results = model.train(
    data="/workspace/custom_dataset.yaml",
    epochs=100,
    imgsz=640,
    batch=16,               # Adjust for your GPU VRAM
    device="cuda",
    workers=8,
    project="/workspace/runs",
    name="yolov10_custom",
    patience=50,            # Early stopping
    save=True,
    save_period=10,         # Save checkpoint every 10 epochs
    plots=True,
    val=True,
    augment=True,           # Data augmentation
    degrees=10.0,
    flipud=0.0,
    fliplr=0.5,
    mosaic=1.0,
    mixup=0.1,
    copy_paste=0.1,
    lr0=0.01,
    lrf=0.01,
    momentum=0.937,
    weight_decay=0.0005,
    warmup_epochs=3.0,
    amp=True                # Automatic Mixed Precision (FP16)
)

print(f"Training complete! Best mAP: {results.results_dict['metrics/mAP50-95(B)']:.3f}")
```

### Train YOLOv9

```python
from ultralytics import YOLO

model = YOLO("yolov9e.pt")

results = model.train(
    data="/workspace/custom_dataset.yaml",
    epochs=100,
    imgsz=640,
    batch=8,               # v9e is larger, needs less batch
    device="cuda",
    workers=8,
    project="/workspace/runs",
    name="yolov9_custom",
    amp=True,
    optimizer="SGD",
    momentum=0.937,
    weight_decay=0.0005
)
```

{% hint style="info" %}
**Training Tips:**

* **Batch size:** Start with `batch=16` for RTX 4090, `batch=32` for A100 40GB
* **Image size:** `imgsz=640` is standard; use 1280 for high-resolution tasks
* **Epochs:** 100 epochs is typical for fine-tuning, 300+ for training from scratch
* **AMP (Mixed Precision):** Always enable `amp=True` for 1.5–2x speedup
  {% endhint %}

***

## Step 6 — Export to TensorRT for Maximum Speed

```python
from ultralytics import YOLO

# Load trained model
model = YOLO("/workspace/runs/yolov10_custom/weights/best.pt")

# Export to TensorRT (FP16 for best speed/accuracy balance)
model.export(
    format="engine",        # TensorRT engine
    device="cuda",
    half=True,              # FP16
    dynamic=False,          # Static shapes for max TRT optimization
    batch=1,                # Optimize for batch size 1 (real-time)
    imgsz=640,
    workspace=4             # TRT workspace in GB
)
# Saved as: best.engine

# Load and run TRT engine
trt_model = YOLO("best.engine")
results = trt_model("image.jpg")
```

### Export to ONNX

```python
# Export to ONNX for deployment flexibility
model.export(
    format="onnx",
    opset=17,
    half=True,              # FP16 weights
    dynamic=True,           # Dynamic batch size
    simplify=True
)
```

***

## Step 7 — Serve as a REST API

```bash
pip install fastapi uvicorn python-multipart

cat > /workspace/yolo_api.py << 'EOF'
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse, FileResponse
from ultralytics import YOLO
from PIL import Image
import io
import uuid
import os

app = FastAPI(title="YOLOv10 Detection API")
model = YOLO("yolov10x.pt")

@app.get("/health")
async def health():
    return {"status": "ok", "model": "yolov10x", "device": "cuda"}

@app.post("/detect")
async def detect(
    file: UploadFile = File(...),
    conf: float = 0.25,
    iou: float = 0.45,
    return_image: bool = False
):
    # Read uploaded image
    image_data = await file.read()
    img = Image.open(io.BytesIO(image_data)).convert("RGB")
    
    # Run detection
    results = model(img, conf=conf, iou=iou, verbose=False)
    result = results[0]
    
    # Build response
    detections = []
    for box in result.boxes:
        detections.append({
            "class": model.names[int(box.cls[0])],
            "confidence": round(float(box.conf[0]), 4),
            "bbox": [round(x, 2) for x in box.xyxy[0].tolist()],
            "class_id": int(box.cls[0])
        })
    
    response = {
        "count": len(detections),
        "detections": detections,
        "image_size": list(result.orig_shape)
    }
    
    if return_image:
        output_path = f"/tmp/{uuid.uuid4()}.jpg"
        result.save(filename=output_path)
        return FileResponse(output_path, media_type="image/jpeg")
    
    return JSONResponse(response)

@app.post("/detect/batch")
async def detect_batch(files: list[UploadFile] = File(...)):
    results = []
    for file in files:
        data = await file.read()
        img = Image.open(io.BytesIO(data)).convert("RGB")
        res = model(img, verbose=False)[0]
        results.append({
            "filename": file.filename,
            "count": len(res.boxes),
            "detections": [
                {"class": model.names[int(b.cls[0])], "conf": float(b.conf[0])}
                for b in res.boxes
            ]
        })
    return JSONResponse({"results": results})

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
EOF

python3 /workspace/yolo_api.py &

# Test the API
curl -X POST "http://localhost:8000/detect" \
    -F "file=@test_image.jpg" | python3 -m json.tool
```

***

## Step 8 — Validate and Benchmark Your Model

```python
from ultralytics import YOLO

model = YOLO("yolov10x.pt")

# Validate on COCO dataset
metrics = model.val(
    data="coco.yaml",
    imgsz=640,
    batch=32,
    device="cuda",
    half=True
)

print(f"mAP50:    {metrics.box.map50:.3f}")
print(f"mAP50-95: {metrics.box.map:.3f}")
print(f"Precision: {metrics.box.mp:.3f}")
print(f"Recall:    {metrics.box.mr:.3f}")

# Benchmark speed
model.benchmark(
    format="engine",   # Compare multiple export formats
    imgsz=640,
    half=True,
    device="cuda"
)
```

***

## Download Results

```bash
# From your local machine:
scp -P <ssh-port> root@<server-ip>:/workspace/runs/yolov10_custom/weights/best.pt ./
scp -P <ssh-port> root@<server-ip>:/workspace/output_video.mp4 ./

# Download entire training run
rsync -avz -e "ssh -p <ssh-port>" \
    root@<server-ip>:/workspace/runs/ \
    ./yolo_training_runs/
```

***

## Troubleshooting

### CUDA Out of Memory During Training

```python
# Reduce batch size
model.train(data="data.yaml", batch=4, imgsz=640)

# Or enable gradient checkpointing
model.train(data="data.yaml", batch=8, imgsz=640, cache=False)
```

### Slow Training Speed

```python
# Enable caching (loads dataset into RAM/GPU)
model.train(data="data.yaml", cache=True)  # Cache to RAM
model.train(data="data.yaml", cache="disk")  # Cache to disk

# Increase workers (careful: too many can slow down)
model.train(data="data.yaml", workers=8)
```

### Low mAP / Poor Detection

```bash
# Verify labels are correct (normalized, within 0-1)
python3 -c "
from ultralytics.data.utils import check_det_dataset
check_det_dataset('custom_dataset.yaml')
"

# Visualize training samples
python3 -c "
from ultralytics import YOLO
model = YOLO('yolov10m.pt')
model.train(data='data.yaml', epochs=1, batch=4, plots=True)
# Check /workspace/runs/train/exp/train_batch*.jpg
"
```

***

## Performance Reference (Clore.ai GPUs)

| Model        | GPU      | Batch | FPS (inference) | mAP50-95 |
| ------------ | -------- | ----- | --------------- | -------- |
| YOLOv10n     | RTX 3090 | 1     | 1,200           | 38.5     |
| YOLOv10s     | RTX 3090 | 1     | 780             | 46.8     |
| YOLOv10m     | RTX 4090 | 1     | 950             | 51.3     |
| YOLOv10x     | RTX 4090 | 1     | 380             | 54.4     |
| YOLOv9e      | A100 40G | 1     | 720             | 55.6     |
| YOLOv10x TRT | RTX 4090 | 1     | 920             | 54.2     |

***

## Additional Resources

* [Ultralytics Documentation](https://docs.ultralytics.com/)
* [YOLOv9 Paper](https://arxiv.org/abs/2402.13616)
* [YOLOv10 Paper](https://arxiv.org/abs/2405.14458)
* [Roboflow Universe](https://universe.roboflow.com/) — 100K+ public datasets
* [Ultralytics HUB](https://hub.ultralytics.com/) — Cloud training platform
* [COCO Dataset](https://cocodataset.org/) — Standard benchmark dataset

***

*YOLOv9 and YOLOv10 on Clore.ai GPU rentals provide an affordable path to training custom object detection models and deploying real-time inference pipelines — without the overhead of AWS SageMaker or Google Vertex AI.*

***

## Clore.ai GPU Recommendations

| Use Case             | Recommended GPU | Est. Cost on Clore.ai |
| -------------------- | --------------- | --------------------- |
| Development/Testing  | RTX 3090 (24GB) | \~$0.12/gpu/hr        |
| Production Inference | RTX 4090 (24GB) | \~$0.70/gpu/hr        |
| Large-batch Training | A100 80GB       | \~$1.20/gpu/hr        |

> 💡 All examples in this guide can be deployed on [Clore.ai](https://clore.ai/marketplace) GPU servers. Browse available GPUs and rent by the hour — no commitments, full root access.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.clore.ai/guides/computer-vision/yolov9-v10.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
