Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions
Xiuling Wang,Minglin Yu,Haixia Wang,Xiao Lu,Zhiguo Zhang
DOI: https://doi.org/10.1109/jsen.2023.3347585
IF: 4.3
2024-02-16
IEEE Sensors Journal
Abstract:Depth estimation plays a pivotal role in various applications, including autonomous driving and robot navigation. In contrast to depth estimation using multiple images, such as stereo depth perception, inferring depth relations from a monocular camera is notably more challenging yet highly valuable. Traditionally, convolutional neural networks (CNNs) with residual structures have been extensively employed for this task, but they inherently constrain the model's feature extraction capabilities. Inspired by HorNet, in this article, we propose a novel self-supervised monocular depth estimation framework based on high-order spatial interactions, referred to as the Hor-Depth. Furthermore, the Hor-Depth improves feature fusion efficiency in the depth network decoder by incorporating the attentional feature fusion (AFF) module based on first-order spatial interaction, leading to more refined predicted disparity maps. To address issues of loss fluctuations and training instability, we introduce a progressive scale-weight adjustment strategy-based loss function. This strategy applies varying constraints to the model at different training stages, effectively reducing training fluctuations and mitigating outliers that significantly deviate from the predicted values. The proposed approach demonstrates exceptional performance in self-supervised monocular depth estimation, surpassing certain stereo supervised or monocular supervised methods, as evidenced by its impressive results on the KITTI Eigen split benchmark.
engineering, electrical & electronic,instruments & instrumentation,physics, applied