Abstract:Deep learning approaches have significantly contributed to recent progress in stereo matching. These deep stereo matching methods are usually based on supervised training, which requires a large amount of high-quality ground-truth depth map annotations that are expensive to collect. Furthermore, only a limited quantity of stereo vision training data are currently available, obtained either by active sensors (Lidar and ToF cameras) or through computer graphics simulations and not meeting requirements for deep supervised training. Here, we propose a novel deep stereo approach called the "self-supervised multiscale adversarial regression network (SMAR-Net)," which relaxes the need for ground-truth depth maps for training. Specifically, we design a two-stage network. The first stage is a disparity regressor, in which a regression network estimates disparity values from stacked stereo image pairs. Stereo image stacking method is a novel contribution as it not only contains the spatial appearances of stereo images but also implies matching correspondences with different disparity values. In the second stage, a synthetic left image is generated based on the left–right consistency assumption. Our network is trained by minimizing a hybrid loss function composed of a content loss and an adversarial loss. The content loss minimizes the average warping error between the synthetic images and the real ones. In contrast to the generative adversarial loss, our proposed adversarial loss penalizes mismatches using multiscale features. This constrains the synthetic image and real image as being pixelwise identical instead of just belonging to the same distribution. Furthermore, the combined utilization of multiscale feature extraction in both the content loss and adversarial loss further improves the adaptability of SMAR-Net in ill-posed regions. Experiments on multiple benchmark datasets show that SMAR-Net outperforms the curr-nt state-of-the-art self-supervised methods and achieves comparable outcomes to supervised methods. The source code can be accessed at: https://github.com/Dawnstar8411/SMAR-Net.

End-to-end Learning of Self-Rectification and Self-Supervised Disparity Prediction for Stereo Vision

A Simple Rectification Method for Linear Multi-Baseline Stereovision System

Rectification of images for parallel multiple-baseline stereo vision

Unconstrained Self-Calibration of Stereo Camera on Visually Impaired Assistance Devices.

Self‐supervised binocular depth estimation algorithm with self‐rectification for autonomous driving

Stereo Calibration and Rectification for Omnidirectional Multi-Camera Systems

A robust stereo feature-aided semi-direct SLAM system

Faster Self-adaptive Deep Stereo.

Stereo Matching by Self-supervision of Multiscopic Vision.

Dive Deeper into Rectifying Homography for Stereo Camera Online Self-Calibration

Self-Supervised Multiscale Adversarial Regression Network for Stereo Disparity Estimation

Learning Inter- and Intra-frame Representations for Non-Lambertian Photometric Stereo

Parallax attention stereo matching network based on the improved group-wise correlation stereo network

A unified and efficient semi-supervised learning framework for stereo matching

Improving Stereo Matching by Incorporating Geometry Prior into Convnet

Bidirectional Semi-supervised Dual-branch CNN for Robust 3D Reconstruction of Stereo Endoscopic Images via Adaptive Cross and Parallel Supervisions

Brain Cholesterol XVIII: Effect of Methylphenidate (Ritalin) on [U-14C] Glucose and [2-3H] Acetate Incorporation

Depth Edge and Structure Optimization-Based End-to-End Self-Supervised Stereo Matching

Semi-Stereo: A Universal Stereo Matching Framework for Imperfect Data Via Semi-supervised Learning

PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo Matching

DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras