Learning confidence measure with transformer in stereo matching

Jini Yang,Minjung Yoo,Jaehoon Cho,Sunok Kim
DOI: https://doi.org/10.1016/j.patcog.2024.110876
IF: 8
2024-09-03
Pattern Recognition
Abstract:We introduce a novel approach for stereo confidence estimation, called ConFormer, leveraging the Transformer architecture. Recent confidence estimation methods commonly adopt convolutional neural networks (CNNs) and learned confidence features with limited receptive fields, thereby having limited capability to model global contexts. Benefiting from global understanding and the long-range dependencies of the attention mechanism, we effectively learn confidence features that take into account global relationships through the Transformer networks. Specifically, in the disparity feature extraction module, we extract global confidence features that encode global interactions with self-attention using a global pooling transformer. To complement local information and capture fine details, we additionally incorporate local prior features into the pooling transformer with an injection scheme. We further extract color confidence features using Transformer blocks to model the global interaction of the color image. The output confidence features from disparity and color image are effectively fused in a weighted attention manner in fusion networks. Experimental results demonstrate that this model outperforms the state-of-the-art CNN-based methods on various benchmarks.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?