FPGA-Based Hardware Accelerator Design and Implementation of Oil Palm Detection

袁鸣,甘霖,柴志雷
DOI: https://doi.org/10.3778/j.issn.1673-9418.1912029
2021-01-01
Abstract:Aiming at the problems of low accuracy and low detection efficiency of high-resolution oil palm detection in deep learning, an effective and reliable solution is proposed from two aspects of algorithm optimization and heterogeneous hardware platform acceleration. Taking YOLOv3 object detection algorithm as an example, the optimization strategy of expanding feature selection and increasing multi-scale feature fusion is adopted to improve the detection accuracy of the algorithm for high-resolution oil palm. In addition, in the process of inference, plenty of applications require high performance models with strict power consumption limits. In order to solve this problem, taking the strategy of integer 8-bits quantitative weights and computational units reuse, this paper designs a high efficiency convolution computational engine based on SIMD. At the same time, through the strategy of the dimension change of the input image, vectorization, transmission to the input module in the form of written queue, this paper increases the efficiency of bus bandwidth greatly and accelerates the input module well. The experimental results show that the accuracy of the improved algorithm model is 97.84%, and a performance of 1.4 TOPS is obtained on the FPGA platform of Intel Arria 10. Compared with the i9-9980XE CPU, 7.51 times of the perform-ance and 33.02 times of energy efficiency are obtained. It is 1.2 times more efficient than Nvidia??s dedicated P40 accelerator.
What problem does this paper attempt to address?