A Large Model Assisted Remote Sensing Image Scene Understanding Algorithm Based on Object Detection

Zilong Wang,Zishan Xu,Wei Yang,Wei Chen,Yuyu Yang
DOI: https://doi.org/10.1007/978-981-97-5597-4_5
2024-01-01
Abstract:As the fields of deep learning and artificial intelligence rapidly advance, significant progress has been made in image understanding and natural language processing. However, the challenge of accurately and deeply understanding images in complex scenes, such as remote sensing imagery, remains a critical issue in current research. This paper introduces a novel approach that combines targeted object detection results with large language models to address the deep understanding and description of complex visual scenes. By incorporating multimodal understanding models (such as CLIP and GPT) and prompt engineering, along with BPO strategies, our method achieves a deep and nuanced understanding and description of complex scenes. We have developed a user interface and experimentally validated the effectiveness and accuracy of our proposed method in real-world application scenarios, demonstrating the framework's superior performance in understanding complex scenes.
What problem does this paper attempt to address?