Scale Propagation Network for Generalizable Depth Completion

Haotian Wang,Meng Yang,Xinhu Zheng,Gang Hua
2024-10-24
Abstract:Depth completion, inferring dense depth maps from sparse measurements, is crucial for robust 3D perception. Although deep learning based methods have made tremendous progress in this problem, these models cannot generalize well across different scenes that are unobserved in training, posing a fundamental limitation that yet to be overcome. A careful analysis of existing deep neural network architectures for depth completion, which are largely borrowing from successful backbones for image analysis tasks, reveals that a key design bottleneck actually resides in the conventional normalization layers. These normalization layers are designed, on one hand, to make training more stable, on the other hand, to build more visual invariance across scene scales. However, in depth completion, the scale is actually what we want to robustly estimate in order to better generalize to unseen scenes. To mitigate, we propose a novel scale propagation normalization (SP-Norm) method to propagate scales from input to output, and simultaneously preserve the normalization operator for easy convergence. More specifically, we rescale the input using learned features of a single-layer perceptron from the normalized input, rather than directly normalizing the input as conventional normalization layers. We then develop a new network architecture based on SP-Norm and the ConvNeXt V2 backbone. We explore the composition of various basic blocks and architectures to achieve superior performance and efficient inference for generalizable depth completion. Extensive experiments are conducted on six unseen datasets with various types of sparse depth maps, i.e., randomly sampled 0.1\%/1\%/10\% valid pixels, 4/8/16/32/64-line LiDAR points, and holes from Structured-Light. Our model consistently achieves the best accuracy with faster speed and lower memory when compared to state-of-the-art methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the generalization problem of the depth completion task in different scenarios. Specifically, although the existing deep - learning - based depth completion models perform well on a single dataset, their generalization ability in unseen scenarios is poor. This limitation mainly stems from the design of the normalization layers in the existing model architectures. These normalization layers cannot effectively propagate the scale information from input to output, thus affecting the performance of the model in different scenarios. To solve this problem, the author proposes a new method - **Scale Propagation Normalization (SP - Norm)** - to better propagate scale information in the depth completion task while retaining the normalization operation to ensure the convergence of the network. In addition, the author develops a new network architecture based on SP - Norm and the ConvNeXt V2 backbone network to achieve better generalization ability and higher inference efficiency. ### Main contributions of the paper 1. **Analysis and proposed solution**: The author analyzes the limitations of existing normalization layers (such as Batch Normalization, Instance Normalization, etc.) in the depth completion task and proposes SP - Norm to overcome these limitations. 2. **New network architecture**: Based on SP - Norm and the ConvNeXt V2 backbone network, a new network architecture is developed. By improving the basic modules and the overall structure, better performance and faster inference speed are achieved. 3. **Experimental verification**: Extensive experiments are carried out on six unseen datasets to verify the superior performance of the proposed model on various types of sparse depth maps, including 0.1%/1%/10% valid pixels randomly sampled, 4/8/16/32/64 - line LiDAR points, and holes generated by structured light. ### Key formulas and concepts - **Definition of SP - Norm**: \[ z_{\text{sp}}^i=\left(\sum_{j = 1}^{n}w_{ij}\hat{d}_j + b_i\right)d_i \] where \(w_{ij}\) and \(b_i\) are the learnable parameters of the single - layer perceptron (SLP), \(n\) is the number of channels of the input data, and \(j\) represents the pixel position. \(\hat{d}_i\) is the normalized input data. - **SP properties**: - When the input \(d_i\) is scaled to \(s d_i\), the output \(z_i\) should be scaled proportionally to \(s z_i\), that is, \(z\propto d\). - Mathematically, it is expressed as \(E(z)\propto E(d)\) and \(D(z)\propto D(d)\), where \(E(\cdot)\) and \(D(\cdot)\) represent the mean and variance functions respectively. By introducing SP - Norm, the author successfully solves the problem of cross - scenario generalization in the depth completion task and significantly improves the performance of the model in unseen scenarios.