A transformer-CNN parallel network for image guided depth completion

Tao Li,Xiucheng Dong,Jie Lin,Yonghong Peng
DOI: https://doi.org/10.1016/j.patcog.2024.110305
IF: 8
2024-02-04
Pattern Recognition
Abstract:Image guided depth completion aims to predict a dense depth map from sparse depth measurements and the corresponding single color image. However, most state-of-the-art methods only rely on convolutional neural network (CNN) or transformer. In this paper, we propose a transformer-CNN parallel network (TCPNet) to integrate the advantages of CNN in local detail recovery and transformer in long-range semantic modeling. Specifically, our CNN branch adopts dense connection to strengthen feature propagation. Since the common transformer computes self-attention based on all the tokens in the window, no matter if they are relevant or not, this will inevitably introduce interferences and noises. To improve the self-attention accuracy, we propose a correlation-based transformer to only allow nearest neighbor tokens to participate in the self-attention computation. We also design a multi-scale conditional random field (CRF) module to implement multi-scale high-dimensional filtering for depth refinement. The comprehensive experimental results on KITTI and NYUv2 demonstrate that our method outperforms the state-of-the-art methods.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?