SAda-Net: A Self-Supervised Adaptive Stereo Estimation CNN For Remote Sensing Image Data

Dominik Hirner,Friedrich Fraundorfer
2024-10-17
Abstract:Stereo estimation has made many advancements in recent years with the introduction of deep-learning. However the traditional supervised approach to deep-learning requires the creation of accurate and plentiful ground-truth data, which is expensive to create and not available in many situations. This is especially true for remote sensing applications, where there is an excess of available data without proper ground truth. To tackle this problem, we propose a self-supervised CNN with self-improving adaptive abilities. In the first iteration, the created disparity map is inaccurate and noisy. Leveraging the left-right consistency check, we get a sparse but more accurate disparity map which is used as an initial pseudo ground-truth. This pseudo ground-truth is then adapted and updated after every epoch in the training step of the network. We use the sum of inconsistent points in order to track the network convergence. The code for our method is publicly available at: <a class="link-external link-https" href="https://github.com/thedodo/SAda-Net" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/thedodo/SAda-Net" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in remote - sensing image data, the stereo estimation task lacks accurate and sufficient ground - truth data, which makes it difficult to apply traditional supervised learning methods. Especially in the fields of satellite and aerial images, obtaining high - quality ground - truth data is both expensive and time - consuming, and in many cases infeasible. To solve this problem, the author proposes a self - supervised adaptive convolutional neural network (CNN), named SAda - Net. This method does not require any additional ground - truth data, but is achieved through the following steps: 1. **Initial Pseudo - ground - truth Data Generation**: Use the left - right consistency check to remove inconsistent points and generate a sparse but more accurate disparity map as the initial pseudo - ground - truth data. 2. **Pseudo - ground - truth Data Update**: After each training iteration, use the number of inconsistent points in the current disparity map to track network convergence and update the pseudo - ground - truth data. 3. **Network Structure Design**: Design a lightweight CNN structure with approximately 495,000 trainable parameters, which is suitable for most commercial hardware and can produce good results. Through these steps, SAda - Net can perform effective self - supervised training without ground - truth data, thus solving the challenges faced by the stereo estimation task in remote - sensing image data. ### Formula Summary - **Left - Right Consistency Check Formula**: \[ |D_L(x, y) - D_R(x - d, y)| > 1.1 \] where \(D_L\) and \(D_R\) represent the disparity values of the left and right images respectively, and \(d\) is the predicted disparity value. - **Loss Function (Hinge Loss)**: \[ \text{loss} = \max(0, 0.2 + s^- - s^+) \] where \(s^+\) is the similarity between matching image patches, and \(s^-\) is the similarity between non - matching image patches. - **Sub - pixel Enhancement Formula**: \[ d_{\text{subpx}} = \begin{cases} d_{\text{Int}} - 0.5 + \arctan \left(\frac{l_d}{r_d}\right), & \text{if } l_d \leq r_d \\ d_{\text{Int}} - 0.5 + \arctan \left(\frac{r_d}{l_d}\right), & \text{otherwise} \end{cases} \] where \(d_{\text{Int}}\) is the selected integer disparity value, and \(l_d = c_d - 1 - c_d\) and \(r_d = c_d + 1 - c_d\) are the cost differences of adjacent pixels respectively. Through these innovations, SAda - Net not only reduces the dependence on high - quality ground - truth data, but also demonstrates its effectiveness in multiple practical scenarios.