Optimizing FPGA-based Convolutional Encoder-Decoder Architecture for Semantic Segmentation

Mengqi Yu,Hongzhi Huang,Hong Liu,Shuyi He,Fei Qiao,Li Luo,Fugui Xie,Xin-Jun Liu,Huazhong Yang
DOI: https://doi.org/10.1109/cyber46603.2019.9066759
2019-01-01
Abstract:Convolutional neural networks (CNNs) for visual semantic segmentation have been attracting considerable attention recently because of their superior support for many significant tasks, such as autonomous driving, semantic SLAM (simultaneous localization and mapping) and remote sensing surveying and mapping. These kinds of applications generally need to be implemented on the smart terminals, which means that a kind of hardware platform with high energy efficiency and real-time performance is required. However, CNNs for semantic segmentation usually contain some symmetrical encoders and decoders, corresponding to the down-sampling process (e.g., pooling, convolution) and the up-sampling process (e.g., unpooling, deconvolution). All of these processes are computing and storage intensive, which limits their applicability in the resource constrained embedded systems. In this paper, an FPGA-based accelerator programed by OpenCL is proposed. We evaluate its performance on the CamVid dataset. The global accuracy only drops by 2.04% with 8-bit quantization. Additionally, the system shows 48.89 GOPS and 2.4x real-time performance against CPU when running on an Arria-10 GX1150 device.
What problem does this paper attempt to address?