Trident Cooperation Network for Building Extraction and Height Estimation

Xiaoqiang Lu,Licheng Jiao,Qiong Liu,Lingling Li,Fang Liu,Xu Liu,Yuting Yang
DOI: https://doi.org/10.1109/igarss52108.2023.10281514
2023-01-01
Abstract:Building extraction and height estimation provide solid fundamentals for reconstructing city morphologies and investigating urban planning. To this aim, the DFC23 establishes a large-scale and multi-modal benchmark for multi-task learning of building reconstruction. However, the problems of data limitation and fore-background confusion severely inhibit the performance of the model. In this work, we propose a novel trident cooperation network (TCNet) to perform end-to-end building extraction and height estimation using RGB and SAR data. Specifically, to enrich the feature representation and generalization of the shared backbone, we introduce a vision transformer adapter to inject vision-specific inductive biases and design a cross-modal fusion (CMF) module to effectively aggregate features from multi-modal data. For downstream visual tasks, we construct trident decoders including a detector, a lightweight MLP segmentation head, and a pixel-wise regression head. Moreover, to highlight the foreground object, we use the binary mask predicted by the MLP head to cooperate with the height estimation map predicted by the estimator. And the weighted sub-task losses are gathered to optimize our TCNet. Experimental results show the effectiveness of our method, ranking 2nd in the test phase of the contest.
What problem does this paper attempt to address?