Abstract:The soft-argmax operation is widely adopted in neural network-based stereo matching methods to enable differentiable regression of disparity. However, network trained with soft-argmax is prone to being multimodal due to absence of explicit constraint to the shape of the probability distribution. Previous methods leverages Laplacian distribution and cross-entropy for training but failed to effectively improve the accuracy and even compromises the efficiency of the network. In this paper, we conduct a detailed analysis of the previous distribution-based methods and propose a novel supervision method for stereo matching, Sampling-Gaussian. We sample from the Gaussian distribution for supervision. Moreover, we interpret the training as minimizing the distance in vector space and propose a combined loss of L1 loss and cosine similarity loss. Additionally, we leveraged bilinear interpolation to upsample the cost volume. Our method can be directly applied to any soft-argmax-based stereo matching method without a reduction in efficiency. We have conducted comprehensive experiments to demonstrate the superior performance of our Sampling-Gaussian. The experimental results prove that we have achieved better accuracy on five baseline methods and two datasets. Our method is easy to implement, and the code is available online.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is that the existing stereo - matching methods based on the soft - argmax operation are prone to the problem of multimodal distribution during the training process, which leads to inaccurate predicted disparity. Specifically: 1. **Multimodal problem**: Although the soft - argmax operation makes disparity regression differentiable, due to the lack of explicit constraints on the shape of the probability distribution, the network is prone to learn multimodal distributions, resulting in the prediction results deviating from the center of the dominant mode. 2. **Insufficient supervision signal**: Previous methods have attempted to train through Laplacian distribution and cross - entropy loss, but have failed to effectively improve accuracy and may reduce network efficiency. 3. **Boundary effect and interpolation problem**: The disparity range (such as [0, 192]) set in previous methods and trilinear interpolation lead to the problems of disparity deviation at the boundary and inability to fit the target distribution. To solve these problems, the author proposes a new supervision method - Sampling - Gaussian. The main improvements of this method are as follows: - **Expand the disparity range**: Expand the disparity range from the original [0, dmax) to [-dext, dmax + dext) to avoid the boundary effect. - **Bilinear interpolation**: Use bilinear interpolation instead of trilinear interpolation to better fit the Gaussian distribution. - **Combined loss function**: Combine the L1 loss and cosine similarity loss to ensure that not only the difference in values but also the similarity in directions is considered. Through these improvements, the Sampling - Gaussian method can significantly improve performance on multiple benchmark methods and maintain an efficient inference speed. ### Summary of mathematical formulas 1. **Soft - argmax operation**: \[ d=\sum_{i}i\cdot\text{softmax}(z_{i})=\sum_{i}i\cdot\frac{e^{z_{i}}}{\sum_{j}e^{z_{j}}} \] 2. **Smooth L1 loss**: \[ \text{smoothl}_{1}(d,\hat{d}) = \begin{cases} 0.5(d - \hat{d})^{2}&\text{if }|d - \hat{d}|<1\\ |d - \hat{d}|- 0.5&\text{otherwise} \end{cases} \] 3. **Gaussian distribution sampling function**: \[ q(x)=\frac{e^{-\frac{(x - \mu)^{2}}{2\sigma^{2}}}}{\sum_{x}e^{-\frac{(x - \mu)^{2}}{2\sigma^{2}}}} \] 4. **Combined loss function**: \[ L(p,q)=L_{1}(p,q)+\lambda\cdot L_{\cos}(p,q) \] where, \[ L_{1}(p,q)=\frac{1}{n}\sum_{i}|p(i)-q(i)| \] \[ L_{\cos}(p,q)=-\frac{\sum_{i}p(i)q(i)}{\sqrt{\sum_{i}p(i)^{2}}\sqrt{\sum_{i}q(i)^{2}}} \] Through these improvements, the Sampling - Gaussian method can supervise the network more effectively and improve the accuracy of stereo matching.

The Sampling-Gaussian for stereo matching

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

Localization with Sampling-Argmax

Gaussian Mixture based Evidential Learning for Stereo Matching

A Joint 2D-3D Complementary Network for Stereo Matching

Stereo sample generation‐based domain generalization network for stereo matching

Local Expansion Moves for Stereo Matching Based on Random Sample Consensus Confidence

Stereo Matching Method with Integrated Geometric Encoding for Disparity Refinement

Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation

UGNet: Uncertainty aware geometry enhanced networks for stereo matching

Ghost-Stereo: GhostNet-based Cost Volume Enhancement and Aggregation for Stereo Matching Networks

A New Stereo Matching Method with Combination of Cross-Based Aggregation and Hierarchical Belief Propagation

Real-time stereo matching with high accuracy via Spatial Attention-Guided Upsampling

Bayesian Stereo Matching Method Based on Edge Constraints

Global Matching-Optimization Network for Stereo Depth Estimation

Exploiting Semantic and Boundary Information for Stereo Matching

Neural Markov Random Field for Stereo Matching

Superpixel Guided Network for Three-Dimensional Stereo Matching

Disparity Distribution Equalization: an Effective Data Enhancement for Stereo Matching

GA-Stereo: A Real-Time Stereo Network Based on the Gradient Flow Shunting Strategy and the Atrous Pyramid Network