Reinforcing Local Structure Perception for Monocular Depth Estimation

Siyuan Zhang,Wenguang Hou,Aliaksandr Chervan,Mingyue Ding,D. Kurlovich,Jingxian Dong
DOI: https://doi.org/10.1109/JSEN.2023.3293156
IF: 4.3
2023-08-15
IEEE Sensors Journal
Abstract:Monocular depth estimation is a basic and critical task in computer vision that finds wide applications in various domains, including robot navigation and autonomous driving. A prevailing method nowadays is leveraging hybrid depth datasets obtained from various depth sensors to predict affine-invariant depth under supervised learning. However, the varying depth ranges in hybrid datasets can result in an unstable network. While some affine-invariant loss functions have been introduced, existing methods may lead to suboptimal geometric structures, such as blurred boundaries and details. To tackle this issue, our approach is centered on reinforcing the local structural perception of images. Specifically, we propose a novel pixel-level supervised loss, called the windowed correlation regression (WCR) loss. It computes the windowed Pearson correlation coefficient (PCC) to constrain the similarity of data distribution within a local region. In addition, we introduce a new coarse-to-fine multiscale normal (CFMN) loss in conjunction with the former loss to further improve geometric accuracy. Our experimental results on six zero-shot datasets demonstrate that our method outperforms state-of-the-art (SOTA) methods. In terms of local geometric structural precision, our method achieves sharper edges and more consistent local grayscale.
Computer Science
What problem does this paper attempt to address?