The Sampling-Gaussian for stereo matching

Baiyu Pan,jichao jiao,Bowen Yao,Jianxin Pang,Jun Cheng
2024-10-09
Abstract:The soft-argmax operation is widely adopted in neural network-based stereo matching methods to enable differentiable regression of disparity. However, network trained with soft-argmax is prone to being multimodal due to absence of explicit constraint to the shape of the probability distribution. Previous methods leverages Laplacian distribution and cross-entropy for training but failed to effectively improve the accuracy and even compromises the efficiency of the network. In this paper, we conduct a detailed analysis of the previous distribution-based methods and propose a novel supervision method for stereo matching, Sampling-Gaussian. We sample from the Gaussian distribution for supervision. Moreover, we interpret the training as minimizing the distance in vector space and propose a combined loss of L1 loss and cosine similarity loss. Additionally, we leveraged bilinear interpolation to upsample the cost volume. Our method can be directly applied to any soft-argmax-based stereo matching method without a reduction in efficiency. We have conducted comprehensive experiments to demonstrate the superior performance of our Sampling-Gaussian. The experimental results prove that we have achieved better accuracy on five baseline methods and two datasets. Our method is easy to implement, and the code is available online.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that the existing stereo - matching methods based on the soft - argmax operation are prone to the problem of multimodal distribution during the training process, which leads to inaccurate predicted disparity. Specifically: 1. **Multimodal problem**: Although the soft - argmax operation makes disparity regression differentiable, due to the lack of explicit constraints on the shape of the probability distribution, the network is prone to learn multimodal distributions, resulting in the prediction results deviating from the center of the dominant mode. 2. **Insufficient supervision signal**: Previous methods have attempted to train through Laplacian distribution and cross - entropy loss, but have failed to effectively improve accuracy and may reduce network efficiency. 3. **Boundary effect and interpolation problem**: The disparity range (such as [0, 192]) set in previous methods and trilinear interpolation lead to the problems of disparity deviation at the boundary and inability to fit the target distribution. To solve these problems, the author proposes a new supervision method - Sampling - Gaussian. The main improvements of this method are as follows: - **Expand the disparity range**: Expand the disparity range from the original [0, dmax) to [-dext, dmax + dext) to avoid the boundary effect. - **Bilinear interpolation**: Use bilinear interpolation instead of trilinear interpolation to better fit the Gaussian distribution. - **Combined loss function**: Combine the L1 loss and cosine similarity loss to ensure that not only the difference in values but also the similarity in directions is considered. Through these improvements, the Sampling - Gaussian method can significantly improve performance on multiple benchmark methods and maintain an efficient inference speed. ### Summary of mathematical formulas 1. **Soft - argmax operation**: \[ d=\sum_{i}i\cdot\text{softmax}(z_{i})=\sum_{i}i\cdot\frac{e^{z_{i}}}{\sum_{j}e^{z_{j}}} \] 2. **Smooth L1 loss**: \[ \text{smoothl}_{1}(d,\hat{d}) = \begin{cases} 0.5(d - \hat{d})^{2}&\text{if }|d - \hat{d}|<1\\ |d - \hat{d}|- 0.5&\text{otherwise} \end{cases} \] 3. **Gaussian distribution sampling function**: \[ q(x)=\frac{e^{-\frac{(x - \mu)^{2}}{2\sigma^{2}}}}{\sum_{x}e^{-\frac{(x - \mu)^{2}}{2\sigma^{2}}}} \] 4. **Combined loss function**: \[ L(p,q)=L_{1}(p,q)+\lambda\cdot L_{\cos}(p,q) \] where, \[ L_{1}(p,q)=\frac{1}{n}\sum_{i}|p(i)-q(i)| \] \[ L_{\cos}(p,q)=-\frac{\sum_{i}p(i)q(i)}{\sqrt{\sum_{i}p(i)^{2}}\sqrt{\sum_{i}q(i)^{2}}} \] Through these improvements, the Sampling - Gaussian method can supervise the network more effectively and improve the accuracy of stereo matching.