SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation

Junyan Ye,Qiyan Luo,Jinhua Yu,Huaping Zhong,Zhimeng Zheng,Conghui He,Weijia Li
2024-04-03
Abstract:This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i.e., using satellite and street-view image pairs. The main challenge lies in overcoming the significant perspective differences between street views and satellite views. In this work, we introduce SG-BEV, a novel approach for satellite-guided BEV fusion for cross-view semantic segmentation. To overcome the limitations of existing cross-view projection methods in capturing the complete building facade features, we innovatively incorporate Bird's Eye View (BEV) method to establish a spatially explicit mapping of street-view features. Moreover, we fully leverage the advantages of multiple perspectives by introducing a novel satellite-guided reprojection module, optimizing the uneven feature distribution issues associated with traditional BEV methods. Our method demonstrates significant improvements on four cross-view datasets collected from multiple cities, including New York, San Francisco, and Boston. On average across these datasets, our method achieves an increase in mIOU by 10.13% and 5.21% compared with the state-of-the-art satellite-based and cross-view methods. The code and datasets of this work will be released at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve the problem of cross - view fine - grained building attribute segmentation. Specifically, it is to perform attribute segmentation between satellite images and street - view images. The main challenge lies in overcoming the significant view - angle differences between street - view and satellite views, which make it difficult to accurately map the building facade features extracted from street - view images into satellite images. To meet this challenge, the authors propose SG - BEV (Satellite - Guided BEV Fusion for Cross - View Semantic Segmentation), a new satellite - guided bird's - eye - view fusion method for cross - view semantic segmentation. By introducing the BEV method, a spatially explicit mapping of street - view features is established, and the problem of uneven feature distribution in the traditional BEV method is optimized through the satellite - guided reprojection module, thereby achieving finer building attribute segmentation. The main contributions of the paper include: - For the first time, the BEV paradigm is applied to the cross - view fine - grained building attribute segmentation task, achieving a complete and continuous mapping of street - view features to the top - down view. - The Satellite - Guided Reprojection (SGR) module is developed, which solves the problem of feature concentrated distribution at building edges in the BEV method. - Evaluations are carried out on four cross - view datasets from different cities, and the results show that, compared with the existing satellite - based and cross - view - based methods, the average mIOU is increased by 10.13% and 5.21% respectively.