Lifting the Veil of Frequency in Joint Segmentation and Depth Estimation

Tianhao Fu,Yingying Li,Xiaoqing Ye,Xiao Tan,Hao Sun,Fumin Shen,Errui Ding
DOI: https://doi.org/10.1145/3474085.3475277
2021-01-01
Abstract:Joint learning of scene parsing and depth estimation remains a challenging task due to the rivalry between the two tasks. In this paper, we revisit the mutual enhancement for joint semantic segmentation and depth estimation. Inspired by the observation that the competition and cooperation could be reflected in the feature frequency components of different tasks, we propose a Frequency Aware Feature Enhancement (FAFE) network that can effectively enhance the reciprocal relationship whereas avoiding the competition. In FAFE, a frequency disentanglement module is proposed to fetch the favorable frequency component sets for each task and resolve the discordance between the two tasks. For task cooperation, we introduce a re-calibration unit to aggregate features of the two tasks, so as to complement task information with each other. Accordingly, the learning of each task can be boosted by the complementary task appropriately. Besides, a novel local-aware consistency loss function is proposed to impose on the predicted segmentation and depth so as to strengthen the cooperation. With the FAFE network and new local-aware consistency loss encapsulated into the multi-task learning network, the proposed approach achieves superior performance over previous state-of-the-art methods. Extensive experiments and ablation studies on multi-task datasets demonstrate the effectiveness of our proposed approach.
What problem does this paper attempt to address?