Multi-Dimensional Attention on Cost Volume for Stereo Matching

Zhou Jiale,Wenqin Huang,Qingmin Liao,Zongqing Lu,Xiaoqian Liu
DOI: https://doi.org/10.1109/ijcnn60899.2024.10651437
2024-01-01
Abstract:Stereo matching is a fundamental research topic in computer vision tasks, and the careful processing of cost volume plays a vital role in stereo matching solutions. Previous convolutional networks have deep-layer structures but could only aggregate local regions, leading to suboptimal matching performance in areas with edges or weak textures, etc. Considering the global perception capability of the attention mechanism, we for the first time propose global attention modules directly operating on the cost volume for cost aggregation. Our proposed attention module is named Multi-Dimensional Attention (MDA) and it includes two submodules: the Cross-Disparity Attention (CDA) and the Intra-Disparity Attention (IDA). CDA accomplishes cost aggregation under different disparities, and IDA is further categorized into Channel-Wise Attention (CWA) and Disparity-Wise Attention (DWA), focusing on the similarity of structure and disparity variations within a fixed disparity. For evaluation, we conduct experiments on four publicly available datasets including KITTI 2012, KITTI 2015, Scene Flow and Middlebury, and results show that our proposed method achieves state-of-the-art (SoTA) performance in stereo matching tasks.
What problem does this paper attempt to address?