Abstract:We introduced Temporally Incremental Disparity Estimation Network (TIDE-Net), a learning-based technique for disparity computation in mono-camera structured light systems. In our hardware setting, a static pattern is projected onto a dynamic scene and captured by a monocular camera. Different from most former disparity estimation methods that operate in a frame-wise manner, our network acquires disparity maps in a temporally incremental way. Specifically, We exploit the deformation of projected patterns (named pattern flow ) on captured image sequences, to model the temporal information. Notably, this newly proposed pattern flow formulation reflects the disparity changes along the epipolar line, which is a special form of optical flow. Tailored for pattern flow, the TIDE-Net, a recurrent architecture, is proposed and implemented. For each incoming frame, our model fuses correlation volumes (from current frame) and disparity (from former frame) warped by pattern flow. From fused features, the final stage of TIDE-Net estimates the residual disparity rather than the full disparity, as conducted by many previous methods. Interestingly, this design brings clear empirical advantages in terms of efficiency and generalization ability. Using only synthetic data for training, our extensitve evaluation results (w.r.t. both accuracy and efficienty metrics) show superior performance than several SOTA models on unseen real data. The code is available on <a class="link-external link-https" href="https://github.com/CodePointer/TIDENet" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The paper primarily addresses the problem of real-time, efficient, and accurate disparity estimation using a monocular camera and structured light system in dynamic scenes. Specifically, the research team proposed a learning-based technique named TIDE-Net (Temporally Incremental Disparity Estimation Network) for computing disparity maps in a monocular camera structured light system. ### Research Background and Challenges - **Challenges in Dynamic Scene Acquisition**: While object scanning in static scenes can utilize the rich information provided by multiple patterns, for dynamic scenes, single-pattern structured light systems are more suitable due to the difficulty of feature matching. This results in sparse and unstable input sequences. - **Training Data Limitations**: For dynamic scenes, it is difficult to obtain densely corresponding ground truth; even if it can be obtained, this data tends to be biased towards specific hardware designs (such as patterns and devices). Therefore, training with synthetic data and testing on real-world scenes is a more practical choice. In this case, the generalization ability of the model is particularly important. ### Main Contributions 1. **Incremental Disparity Estimation Framework**: Based on TIDE-Net, it fully utilizes the locality and sequential characteristics of dynamic scene images, reduces the parameter size by focusing on the nonlinear incremental part, while ensuring accuracy and efficiency. 2. **Pattern Flow Algorithm**: A new algorithm is proposed to estimate the pattern flow, which is the correspondence between the projected patterns of adjacent frames in the structured light system, and input it into TIDE-Net. 3. **Experimental Proof**: Although TIDE-Net is trained only with synthetic data and evaluated on unadapted real data, its performance surpasses several state-of-the-art methods, showing better accuracy and computational cost, demonstrating the method's efficient domain-invariant generalization ability. ### Method Overview - **Problem Definition**: The paper addresses the problem of estimating 3D depth from consecutive frames captured by a static pattern projector and a monocular camera. - **Pattern Flow**: This is the deformation of the projected pattern in the observed images, representing the correspondence between the projected patterns of adjacent frames in the structured light system. - **TIDE Network**: Combines incremental update techniques and pre-warping of pattern flow to achieve disparity estimation through a deep neural network model. ### Experimental Results - **Qualitative Results**: Demonstrated performance on real-world non-rigid motion data, especially in occluded parts, providing better predictions using historical information. - **Quantitative Results**: Evaluated on various datasets, including datasets with the same configuration as the training data, real-world rigid motion data, and synthetic indoor scenes. The experimental results show that TIDE-Net performs excellently in terms of accuracy and efficiency. - **Efficiency Comparison**: Compared to other methods, TIDE-Net processes faster on GPU, proving its efficiency. ### Conclusion In summary, the paper proposes TIDE-Net, which effectively addresses the challenges of disparity estimation in dynamic scenes. By introducing the concept of pattern flow and adopting an incremental disparity estimation method, it not only improves accuracy but also reduces computational cost, demonstrating good generalization ability.

TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in Structured Light System

Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity.

Online Adaptive Disparity Estimation for Dynamic Scenes in Structured Light Systems

Improved real-time three-dimensional stereo matching with local consistency

Lightweight Event-based Optical Flow Estimation via Iterative Deblurring

Stereo Matching Method with Integrated Geometric Encoding for Disparity Refinement

MSDC-Net: Multi-Scale Dense and Contextual Networks for Automated Disparity Map for Stereo Matching

OPAL: Occlusion Pattern Aware Loss for Unsupervised Light Field Disparity Estimation

Learning for Disparity Estimation Through Feature Constancy

FADNet: A Fast and Accurate Network for Disparity Estimation

UFD-PRiME: Unsupervised Joint Learning of Optical Flow and Stereo Depth through Pixel-Level Rigid Motion Estimation

Fast Light-field Disparity Estimation with Multi-disparity-scale Cost Aggregation.

Revisiting Disparity from Dual-Pixel Images: Physics-Informed Lightweight Depth Estimation

Adaptive EPI-Matching Cost for Light Field Disparity Estimation

Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding

Learning Sub-Pixel Disparity Distribution for Light Field Depth Estimation

Temporal Event Stereo via Joint Learning with Stereoscopic Flow

Patchmatch Stereo++: Patchmatch Binocular Stereo with Continuous Disparity Optimization

Fast Multi-frame Stereo Scene Flow with Motion Segmentation

FlowDepth: Decoupling Optical Flow for Self-Supervised Monocular Depth Estimation

Unsupervised Learning Optical Flow in Multi-frame Dynamic Environment Using Temporal Dynamic Modeling