Algorithm/Hardware Co-Design for Real-Time On-Satellite CNN based Ship Detection in SAR Imagery
Geng Yang,Jie Lei,Weiying Xie,Zhenman Fang,Yunsong Li,Jiaxuan Wang,Xin Zhang
DOI: https://doi.org/10.1109/TGRS.2022.3161499
IF: 8.2
2022-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Recently, the convolutional neural network (CNN) based approach for on-satellite ship detection in synthetic aperture radar (SAR) images has received increasing attention, since it does not rely on predefined imagery features and distributions that are required in conventional detection methods. To achieve a high detection accuracy, most of the existing CNN-based methods leverage complex off-the-shelf CNN models for optical imagery. Unfortunately, this usually leads to expensive computational cost, which is hard to process in real time using resource-constrained devices deployed in the harsh satellite environment. In this paper, we propose OSCAR-RT, the first end-to-end algorithm/hardware co-design framework for On-Satellite CNN based SAR ship detection, which can simultaneously produce an accurate and hardware-friendly CNN model and an ultra-efficient FPGA-based hardware accelerator that can be deployed on satellites. With the real-time on-satellite processing speed in mind, we start from a state-of-the-art compact CNN model for optical imagery. To eliminate the sharp decrease in the detection accuracy for SAR imagery, we analyze the discrepancy between the SAR domain and optical domain, and propose to adapt the model by adjusting the output feature size to better detect relatively smaller objects in SAR imagery. To improve the detection speed, we propose to develop a fully-pipelined inter-layer streaming accelerator architecture, where all the layers of the CNN model can be concurrently processed using on-chip FPGA resources. To achieve this architecture, we first propose a hardware-guided, progressive, and structural pruning strategy, which is guided by our modeled hardware metrics and applies state-of-the-art coarse-grained and fine-grained filter pruning, as well as mixed-precision quantization techniques. Moreover, to improve the reusability and portability of the hardware accelerator design, we develop a library of highly optimized CNN components in high-level synthesis, together with their performance and resource models. Finally, we map the pruned CNN model onto these hardware library components in a fully-pipelined inter-layer streaming fashion, by adjusting their parallelism factors to balance the execution of each layer and fit into the resource constraint. Experimental results using the adapted MobileNetV1, MobileNetV2, and SqueezeNet models on the widely used SAR ship detection dataset (SSDD) demonstrate the effectiveness of OSCAR-RT: for the MobileNetV1 model, it achieves an average precision of 94%, a detection speed of 652 frames per second on the Xilinx VC709 FPGA evaluation board, while consuming about 5.8W power.
Computer Science