Integration of Geometric and Perceptual Information for Monocular Depth Estimation

Shufeng Zhang,Tianjiao Yang,Chenxiang Zhang
DOI: https://doi.org/10.1109/icipca61593.2024.10709255
2024-01-01
Abstract:In the domain of deep learning, spherical deformations pose formidable obstacles, especially in intricate regression tasks such as depth estimation. Applying traditional CNN layers to distorted images often leads to a detrimental loss of crucial information. To address this challenge, our research introduces ContextFusion, a framework for 360-degree monocular depth estimation. This innovative approach transforms 360-degree images into less distorted perspective blocks, known as tangent images, which are then processed by CNNs for predictions. The block-level predictions are then synthesized into a final output. To address the discrepancies among patch predictions, which significantly impact the integration quality, we propose a novel framework with distinct components. Firstly, we introduce a geometric perceptual feature fusion mechanism that leverages both 3D geometric and 2D image features to compensate for variations between blocks. Secondly, we incorporate a self-attention-based converter architecture that enables global aggregation of patch information, enhancing overall consistency. Finally, an iterative depth refinement mechanism leverages more precise geometric features to iteratively refine the estimated depth. Through comprehensive experimentation, our method exhibits remarkable ability to mitigate the distortion problem and achieves state-of-the-art performance on multiple benchmark datasets for monocular depth estimation.
What problem does this paper attempt to address?