Abstract:Stereo estimation has made many advancements in recent years with the introduction of deep-learning. However the traditional supervised approach to deep-learning requires the creation of accurate and plentiful ground-truth data, which is expensive to create and not available in many situations. This is especially true for remote sensing applications, where there is an excess of available data without proper ground truth. To tackle this problem, we propose a self-supervised CNN with self-improving adaptive abilities. In the first iteration, the created disparity map is inaccurate and noisy. Leveraging the left-right consistency check, we get a sparse but more accurate disparity map which is used as an initial pseudo ground-truth. This pseudo ground-truth is then adapted and updated after every epoch in the training step of the network. We use the sum of inconsistent points in order to track the network convergence. The code for our method is publicly available at: <a class="link-external link-https" href="https://github.com/thedodo/SAda-Net" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/thedodo/SAda-Net" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in remote - sensing image data, the stereo estimation task lacks accurate and sufficient ground - truth data, which makes it difficult to apply traditional supervised learning methods. Especially in the fields of satellite and aerial images, obtaining high - quality ground - truth data is both expensive and time - consuming, and in many cases infeasible. To solve this problem, the author proposes a self - supervised adaptive convolutional neural network (CNN), named SAda - Net. This method does not require any additional ground - truth data, but is achieved through the following steps: 1. **Initial Pseudo - ground - truth Data Generation**: Use the left - right consistency check to remove inconsistent points and generate a sparse but more accurate disparity map as the initial pseudo - ground - truth data. 2. **Pseudo - ground - truth Data Update**: After each training iteration, use the number of inconsistent points in the current disparity map to track network convergence and update the pseudo - ground - truth data. 3. **Network Structure Design**: Design a lightweight CNN structure with approximately 495,000 trainable parameters, which is suitable for most commercial hardware and can produce good results. Through these steps, SAda - Net can perform effective self - supervised training without ground - truth data, thus solving the challenges faced by the stereo estimation task in remote - sensing image data. ### Formula Summary - **Left - Right Consistency Check Formula**: \[ |D_L(x, y) - D_R(x - d, y)| > 1.1 \] where \(D_L\) and \(D_R\) represent the disparity values of the left and right images respectively, and \(d\) is the predicted disparity value. - **Loss Function (Hinge Loss)**: \[ \text{loss} = \max(0, 0.2 + s^- - s^+) \] where \(s^+\) is the similarity between matching image patches, and \(s^-\) is the similarity between non - matching image patches. - **Sub - pixel Enhancement Formula**: \[ d_{\text{subpx}} = \begin{cases} d_{\text{Int}} - 0.5 + \arctan \left(\frac{l_d}{r_d}\right), & \text{if } l_d \leq r_d \\ d_{\text{Int}} - 0.5 + \arctan \left(\frac{r_d}{l_d}\right), & \text{otherwise} \end{cases} \] where \(d_{\text{Int}}\) is the selected integer disparity value, and \(l_d = c_d - 1 - c_d\) and \(r_d = c_d + 1 - c_d\) are the cost differences of adjacent pixels respectively. Through these innovations, SAda - Net not only reduces the dependence on high - quality ground - truth data, but also demonstrates its effectiveness in multiple practical scenarios.

SAda-Net: A Self-Supervised Adaptive Stereo Estimation CNN For Remote Sensing Image Data

Self-Supervised Multiscale Adversarial Regression Network for Stereo Disparity Estimation

Faster Self-adaptive Deep Stereo.

FCDSN-DC: An Accurate and Lightweight Convolutional Neural Network for Stereo Estimation with Depth Completion

Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo

RSCNN: A CNN-Based Method to Enhance Low-Light Remote-Sensing Images

Stereo Matching Method for Remote Sensing Images Based on Attention and Scale Fusion

Depth Edge and Structure Optimization-Based End-to-End Self-Supervised Stereo Matching

S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach

Stereo Matching by Self-supervision of Multiscopic Vision.

MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching

Unsupervised Stereo Matching Network For VHR Remote Sensing Images Based On Error Prediction

MULTI-SCALE CASCADE DISPARITY REFINEMENT STEREO NETWORK

Self-adapting confidence estimation for stereo

DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points

Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising

Bidirectional Semi-supervised Dual-branch CNN for Robust 3D Reconstruction of Stereo Endoscopic Images via Adaptive Cross and Parallel Supervisions

EAI-Stereo: Error Aware Iterative Network for Stereo Matching

A CNN Based Approach for the Point-Light Photometric Stereo Problem

SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning