Guided Design for Efficient On-device Object Detection Model

Tao Sheng,Yang Liu
DOI: https://doi.org/10.1201/9781003162810-10
2022-01-12
Abstract:The low-power computer vision (LPCV) challenge is an annual competition for the best technologies in image classification and object detection measured by both efficiency (execution time and energy consumption) and accuracy (precision/recall). Our Amazon team has won three awards from LPCV challenges: 1st prize for interactive object detection challenge in 2018 and 2019 and 2nd prize for interactive image classification challenge in 2018. This paper is to share our award-winning methods, which can be summarized as four major steps. First, 8-bit quantization friendly model is one of the key winning points to achieve the short execution time while maintaining the high accuracy on edge devices. Second, network architecture optimization is another winning keypoint. We optimized the network architecture to meet the 100ms latency requirement on Pixel2 phone. The third one is dataset filtering. We removed the images with small objects from the training dataset after deeply analyzing the training curves, which significantly improved the overall accuracy. And the forth one is non-maximum suppression optimization. By combining all the above steps together with the other training techniques, for example, cosine learning function and transfer learning, our final solutions were able to win the top prizes out of large number of submitted solutions across worldwide. Take-aways: Discusses the methods involved in the winning solutions over the years. Explains the impacts of each method (quantization, architecture search, hyperparameter tuning) Reduces the resolutions to improve performance Discusses the methods involved in the winning solutions over the years. Explains the impacts of each method (quantization, architecture search, hyperparameter tuning) Reduces the resolutions to improve performance The low-power computer vision (LPCV) challenge is an annual competition for the best technologies in image classification and object detection measured by both efficiency and accuracy. Competitions encourage diligent development of technologies. The goal of Low-Power Image Recognition Challenge (LPIRC) is to achieve the best accuracy within the wall-time constraint by evaluating the accuracy and the execution time using TensorFlow models. Although no power or energy constraint is explicitly measured for this track, latency correlates reasonably with energy consumption. After diligent hard-work, our Amazon team won the two awards out of all submitted solutions at 2018: the first place for interactive object detection challenge and the second place for interactive image classification challenge. Quantization is crucial for deep learning inference on edge devices, which have very limited budget for power and memory consumption. The fastest way to improve the latency of a deep learning network is to reduce the input image resolution.
What problem does this paper attempt to address?