See, Perceive, and Answer: A Unified Benchmark for High-Resolution Postdisaster Evaluation in Remote Sensing Images

Danpei Zhao,Jiankai Lu,Bo Yuan
DOI: https://doi.org/10.1109/tgrs.2024.3386934
IF: 8.2
2024-04-27
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Visual-language generation for remote sensing image (RSI) is an emerging and challenging research area that requires multitask learning to achieve a comprehensive understanding. However, most existing models are limited to single-level tasks and do not leverage the advantages of the visual-language pretraining (VLP) model. In this article, we present a unified benchmark that learns multiple tasks, including interpretation, perception, and question answering. Specifically, a model is designed to perform semantic segmentation, image captioning, and visual question answering (VQA) for high-resolution RSIs simultaneously. Our model not only attains pixel-level segmentation accuracy and global semantic comprehension but also responds to user-defined queries of interest. Moreover, to address the challenges of multitask perception, we construct a novel multitask dataset called FloodNet+, which provides a new solution for the comprehensive postdisaster assessment. The experimental results demonstrate that our approach surpasses existing methods or baseline in all three tasks. This is the first attempt to simultaneously consider multiple remote sensing perception tasks in an integrated framework, which lays a solid foundation for future research in this area. Our datasets are publicly available at: https://github.com/LDS614705356/FloodNet-plus.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?