Abstract:Depth completion aims to predict dense depth maps with sparse depth measurements from a depth sensor. Currently, Convolutional Neural Network (CNN) based models are the most popular methods applied to depth completion tasks. However, despite the excellent high-end performance, they suffer from a limited representation area. To overcome the drawbacks of CNNs, a more effective and powerful method has been presented: the Transformer, which is an adaptive self-attention setting sequence-to-sequence model. While the standard Transformer quadratically increases the computational cost from the key-query dot-product of input resolution which improperly employs depth completion tasks. In this work, we propose a different window-based Transformer architecture for depth completion tasks named Sparse-to-Dense Transformer (SDformer). The network consists of an input module for the depth map and RGB image features extraction and concatenation, a U-shaped encoder-decoder Transformer for extracting deep features, and a refinement module. Specifically, we first concatenate the depth map features with the RGB image features through the input model. Then, instead of calculating self-attention with the whole feature maps, we apply different window sizes to extract the long-range depth dependencies. Finally, we refine the predicted features from the input module and the U-shaped encoder-decoder Transformer module to get the enriching depth features and employ a convolution layer to obtain the dense depth map. In practice, the SDformer obtains state-of-the-art results against the CNN-based depth completion models with lower computing loads and parameters on the NYU Depth V2 and KITTI DC datasets.

SparseFormer: Attention-based Depth Completion Network

Least Square Estimation Network for Depth Completion

MFF-Net: Towards Efficient Monocular Depth Completion With Multi-Modal Feature Fusion

Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera

Agspn: Efficient Attention-Gated Spatial Propagation Network for Depth Completion

Deep Sparse Depth Completion Using Multi-Affinity Matrix

Depth Completion from Sparse LiDAR Data with Depth-Normal Constraints

Depth-Independent Depth Completion via Least Square Estimation

CompletionFormer: Depth Completion with Convolutions and Vision Transformers

A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module

Learning an Efficient Multimodal Depth Completion Model

To Complete or to Estimate, That is the Question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation

DepthFormer: Exploiting Long-range Correlation and Local Information for Accurate Monocular Depth Estimation

Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion

Deep Depth Completion from Extremely Sparse Data: A Survey

SDformer: Efficient End-to-End Transformer for Depth Completion

DenseLiDAR: A Real-Time Pseudo Dense Depth Guided Depth Completion Network

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion

Deep Sparse Depth Completion Using Multi-Scale Residuals and Channel Shuffle

SemAttNet: Towards Attention-based Semantic Aware Guided Depth Completion