Depth Estimation of Multi-Modal Scene Based on Multi-Scale Modulation

Anjie Wang,Zhijun Fang,Xiaoyan Jiang,Yongbin Gao,Gaofeng Cao,Siwei Ma
DOI: https://doi.org/10.1109/icip49359.2023.10222066
2023-01-01
Abstract:As multimodal information is complementary, effectively utilizing scene multimodal information has become an increasingly important research topic for many scholars. This paper proposes a novel multi-scale global learning strategy that utilizes both echo and visual modal data as inputs to estimate scene depth. The framework involves constructing a multi-scale feature extraction method using pyramid pooling modules to aggregate contextual information from different regions and improve global information acquisition ability. Furthermore, a recurrent multi-scale feature modulation module is introduced to generate more semantic and accurate spatial representations in each iteration update process. Additionally, a multi-scale fusion method is constructed for the fusion of echo and visual modalities. The proposed method's superior performance is demonstrated through sufficient experiments conducted on the Replica dataset.
What problem does this paper attempt to address?