HA-Bins: Hierarchical Adaptive Bins for Robust Monocular Depth Estimation across Multiple Datasets
Ruijie Zhu,Ziyang Song,Li Liu,Jianfeng He,Tianzhu Zhang,Yongdong Zhang
DOI: https://doi.org/10.1109/tcsvt.2023.3335316
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Existing monocular depth estimation methods have achieved satisfactory performance on wild datasets. However, these methods are usually trained and tested on a single dataset, which makes them difficult to generalize to other scenarios. To learn diverse scene priors from multiple datasets, we propose a hierarchical framework with adaptive bins for robust monocular depth estimation, which consists of two critical components: a group-wise query generator to assign hierarchical bins and a correlation-aware transformer decoder to generate adaptive bin features. The proposed HA-Bins enjoys several merits. First, the group-wise query generator progressively increases the number of bin queries for multi-scale image features, resulting in a hierarchical bin distribution robust to diverse scenarios. Second, the correlation-aware transformer decoder refines the correlation of bin queries and image features, effectively improving adaptive image feature aggregation. We visualize the query activation maps on NYUDepthv2 dataset, showing that the proposed network effectively suppresses the depth-irrelevant regions. Experiments on KITTI, Sintel, and RabbitAI benchmarks show that without any fine-tuning, our model jointly trained on multiple datasets achieves competitive performance with the state-of-the-art and solid robustness toward diverse scenarios. In addition, our method wins second place in Robust Vision Challenge 2022 towards challenging scenarios with different characteristics.
engineering, electrical & electronic