Local and Global Context Attentive Fusion Network for Traffic Scene Parsing

WANG Zeyu,BU Shuhui,HUANG Wei,ZHENG Yuanpan,WU Qinggang,ZHANG Xu
DOI: https://doi.org/10.11772/j.issn.1001-9081.2022020245
2023-01-01
Journal of Computer Applications
Abstract:In order to solve the local and global contextual information adaptive aggregation problem in traffic scene parsing, a Local and Global Context Attentive Fusion Network (LGCAFN) , with three-module architecture was proposed.The front-end feature extraction module consisted of the improved 101-layer Residual Network (ResNet-101) , which was based on Cascaded Atrous Spatial Pyramid Pooling (CASPP) , unit, and was able to extract object’s multi-scale local features more effectively. The mid-end structural learning module was composed of eight Long Short-Term Memory (LSTM) , branches, and was able to infer spatial structural features of object’s adjacent scene regions in eight different directions more accurately. In the back-end feature fusion module, a three-stage fusion method based on attention mechanism was adopted to adaptively aggregate useful contextual information and shield from noisy contextual information, and the generated multimodal fusion features were able to represent object’s semantic information in a more comprehensive and accurate way.Experimental results on Cityscapes standard and extended datasets demonstrate that compared to the existing state-of-the-art methods such as Inverse Transformation Network (ITN) , , and Object Contextual Representation Network (OCRN) , , LGCAFN achieves the best mean Intersection over Union (mIoU) , , reaching 84. 0% and 86. 3% respectively, showing that LGCAFN can parse traffic scenes accurately and is helpful to realize autonomous driving of vehicles.
What problem does this paper attempt to address?