# Overview

GPU-accelerated DevOps tools and inference engines for production machine learning workloads.

Modern DevOps increasingly relies on GPU acceleration for ML model serving, real-time inference, and high-performance computing tasks. This category covers production-ready tools that leverage GPU compute for faster model inference and optimized deployment pipelines.

Deploy enterprise-grade inference engines and runtime environments on CLORE.AI GPUs to serve ML models at scale with minimal latency and maximum throughput across the Clore.ai marketplace.

## Available Guides

| Guide                                                                    | Use Case                       | Difficulty |
| ------------------------------------------------------------------------ | ------------------------------ | ---------- |
| [ONNX Runtime GPU](https://docs.clore.ai/guides/gpu-devops/onnx-runtime) | Cross-platform model inference | Medium     |
| [TensorRT-LLM](https://docs.clore.ai/guides/gpu-devops/tensorrt-llm)     | Optimized LLM serving          | Advanced   |

## GPU Recommendations

| Workload           | Minimum GPU | Recommended |
| ------------------ | ----------- | ----------- |
| ONNX Inference     | GTX 1660    | RTX 3070+   |
| TensorRT-LLM       | RTX 3090    | A100 40GB   |
| Production Serving | RTX 4090    | H100        |

## Performance Tips

* Use TensorRT for NVIDIA GPU optimization
* Enable mixed precision (FP16) for faster inference
* Batch requests for higher throughput
* Monitor GPU utilization and memory usage

## Related Guides

* [Language Models](https://docs.clore.ai/guides/language-models/language-models)
* [MLOps](https://docs.clore.ai/guides/mlops-and-deployment/mlops)
