Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery

Wei Zhang,Miaoxin Cai,Tong Zhang,Guoqiang Lei,Yin Zhuang,Xuerui Mao

2024-06-13

Abstract:Ship detection needs to identify ship locations from remote sensing (RS) scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. To address this challenge, in this article, leveraging the large language models (LLMs)'s powerful generalization ability, a unified visual-language model called Popeye is proposed for multi-source ship detection from RS imagery. Specifically, to bridge the interpretation gap between the multi-source images for ship detection, a novel unified labeling paradigm is designed to integrate different visual modalities and the various ship detection ways, i.e., horizontal bounding box (HBB) and oriented bounding box (OBB). Subsequently, the hybrid experts encoder is designed to refine multi-scale visual features, thereby enhancing visual perception. Then, a visual-language alignment method is developed for Popeye to enhance interactive comprehension ability between visual and language content. Furthermore, an instruction adaption mechanism is proposed for transferring the pre-trained visual-language knowledge from the nature scene into the RS domain for multi-source ship detection. In addition, the segment anything model (SAM) is also seamlessly integrated into the proposed Popeye to achieve pixel-level ship segmentation without additional training costs. Finally, extensive experiments are conducted on the newly constructed ship instruction dataset named MMShip, and the results indicate that the proposed Popeye outperforms current specialist, open-vocabulary, and other visual-language models for zero-shot multi-source ship detection.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the problem of ship detection in multi-source remote sensing images. Specifically, due to different imaging payloads, the diversity of ship appearances, and the complex background interference from a bird's-eye view, establishing a unified multi-source ship detection paradigm is very challenging. To tackle this challenge, the authors propose a unified vision-language model named Popeye, leveraging the powerful generalization capabilities of large-scale language models (LLMs) to achieve ship detection in multi-source remote sensing images. Popeye bridges the interpretative gap between images from different sources by designing a new unified annotation paradigm and integrates different visual modalities and various ship detection methods (such as Horizontal Bounding Box [HBB] and Oriented Bounding Box [OBB]). Additionally, a hybrid expert encoder is designed to refine multi-scale visual features, thereby enhancing visual perception capabilities. A vision-language alignment method is then developed to enhance the interactive understanding between visual and language content. To transfer the pre-trained vision-language knowledge from natural scenes to the remote sensing domain for multi-source ship detection, the paper proposes an instruction adaptation mechanism. Furthermore, the "Segment Anything Model" (SAM) is seamlessly integrated into Popeye to achieve pixel-level ship segmentation without additional training costs. In summary, the goal of the paper is to construct a unified vision-language framework to understand multi-source and multi-modal ship data in the remote sensing domain and to exhibit superior performance in zero-shot scenarios, surpassing existing specialized, open-vocabulary, and other vision-language models.

Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery

A Ship Detection Model with Progressive Feature Fusion and Cross-Spatial Learning Attention Mechanism for Optical Remote Sensing Images

A Multiscale Ship Detection Algorithm Based on Optical Remote Sensing Image

Maritime Ship Detection Method for Satellite Images Based on Multiscale Feature Fusion

An Improved YOLOv8 OBB Model for Ship Detection through Stable Diffusion Data Augmentation

YOLOSeaShip: a lightweight model for real-time ship detection

Ship Target Detection in Optical Remote Sensing Images Based on E2YOLOX-VFL

Ship detection of optical remote sensing image in multiple scenes

Ship Detection Based on YOLO Algorithm for Visible Images

A Lightweight Algorithm for Ship Object Detection in Complex Marine Environments

SHIP-YOLO: A Lightweight Synthetic Aperture Radar Ship Detection Model Based on YOLOv8n Algorithm

Ship Detection from Optical Remote Sensing Images Using Multi-Scale Analysis and Fourier HOG Descriptor

LMO-YOLO: A Ship Detection Model for Low-Resolution Optical Satellite Imagery

Improved YOLOv3 Based on Attention Mechanism for Fast and Accurate Ship Detection in Optical Remote Sensing Images

Improved YOLOv8n for Lightweight Ship Detection

YOLO-SD: Small Ship Detection in SAR Images by Multi-Scale Convolution and Feature Transformer Module

CSD-YOLO: A Ship Detection Algorithm Based on a Deformable Large Kernel Attention Mechanism

Yolov5s-MSD: a multi-scale ship detector for visible video image

A Decoupled Head and Multiscale Coordinate Convolution Detection Method for Ship Targets in Optical Remote Sensing Images

Optical Remote Sensing Ship Recognition and Classification Based on Improved YOLOv5

High-Efficiency and High-Precision Ship Detection Algorithm Based on Improved YOLOv8n