A Deployment Scheme of YOLOv5 with Inference Optimizations Based on the Triton Inference Server

Jiacong Fang,Qiong Liu,Jingzheng Li
DOI: https://doi.org/10.1109/icccbda51879.2021.9442557
2021-01-01
Abstract:Object detection constitutes a large part of computer vision applications. You Only Look Once (YOLO) v5 is a salient object detection algorithm that provides high accuracy and real-time performance. This paper illustrates a deployment scheme of YOLOv5 with inference optimizations on Nvidia graphics cards using an open-source deep-learning deployment framework named Triton Inference Server. Moreover, we developed a non-maximum suppression (NMS) operator with dynamic-batch-size support in TensorRT to accelerate inference. The experimental results show that both throughput and latency are improved significantly through our deployment scheme.
What problem does this paper attempt to address?