ONNX Runtime GPU
Why ONNX Runtime?
Feature
ONNX Runtime
TorchScript
TensorFlow Serving
Supported Execution Providers
Provider
Hardware
Use Case
Prerequisites
Step 1 — Rent a GPU on Clore.ai
Step 2 — Deploy Your Container
Step 3 — Install ONNX Runtime with GPU Support
Step 4 — Export Your Model to ONNX
PyTorch Model Export
HuggingFace Transformers Export
Export with ORT Optimization
Step 5 — Run Inference with ONNX Runtime
Basic GPU Inference
Batch Inference for Throughput
Step 6 — TensorRT Execution Provider (Maximum Performance)
Step 7 — INT8 Quantization for Maximum Speed
Step 8 — Build an Inference API
Step 9 — Monitor GPU Usage
Performance Benchmarks
Model
GPU
Provider
Throughput (inf/sec)
Troubleshooting
CUDA Provider Not Available
TensorRT Compilation Errors
Shape Mismatch Errors
Advanced: Multi-Model Pipeline
Additional Resources
Clore.ai GPU Recommendations
Use Case
Recommended GPU
Est. Cost on Clore.ai
Last updated
Was this helpful?