UIF-BEV: an Underlying Information Fusion Framework for Bird's-Eye-View Semantic Segmentation

Yilong Ren,Lening Wang,Minda Li,Han Jiang,Chunmian Lin,Haiyang Yu,Zhiyong Cui
DOI: https://doi.org/10.1109/tiv.2024.3395272
IF: 8.2
2024-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:Semantic segmentation based on Bird's-eye-view (BEV) is crucial for autonomous driving. However, current methods for voxel-uplifting-based depth estimation often result in flattened ground, and transformer-based methods lack model interpretability, resulting in information loss and false transformations during image and multi-camera fusion. To tackle this issue, we propose UIF-BEV, an end-to-end framework that fuses underlying information for BEV semantic segmentation. In UIF-BEV, we construct a fusion encoder to combine the camera's underlying information and vehicle motion features across continuous frames, enabling multi-view conversion and image fusion. Additionally, we propose directional attention and tracking attention modules to enhance recognition accuracy and perception prediction for moving vehicles with varying speeds, taking into account their unsynchronized perspectives and timing. To generate segmentation results, we design a bi-directional overlapping attention decoding block that fuses multi-features. Experimental results using the nuScenes dataset demonstrate the effectiveness of UIF-BEV. It significantly improves the stitching effect of image edges and cross-views in semantic segmentation, while also reducing deformation errors caused by image transformations. Furthermore, UIF-BEV outperforms all benchmarks. Ablation experiments confirm the efficacy of each component in the framework. UIF-BEV presents a promising solution for real-time BEV map reconstruction and holds potential for various applications in the field of computer vision and autonomous driving. Our code can be publicly available at https://github.com/LeningWang/UIF-BEV .
What problem does this paper attempt to address?