Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition

Zeyu Li,Suncheng Xiang,Tong Yu,Jingsheng Gao,Jiacheng Ruan,Yanping Hu,Ting Liu,Yuzhuo Fu
2024-06-11
Abstract:The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audio data and predict the vessel type. The current UATR dataset exhibits shortcomings in both duration and sample quantity. In this paper, we propose Oceanship, a large-scale and diverse underwater audio dataset. This dataset comprises 15 categories, spans a total duration of 121 hours, and includes comprehensive annotation information such as coordinates, velocity, vessel types, and timestamps. We compiled the dataset by crawling and organizing original communication data from the Ocean Communication Network (ONC) database between 2021 and 2022. While audio retrieval tasks are well-established in general audio classification, they have not been explored in the context of underwater audio recognition. Leveraging the Oceanship dataset, we introduce a baseline model named Oceannet for underwater audio retrieval. This model achieves a recall at 1 (R@1) accuracy of 67.11% and a recall at 5 (R@5) accuracy of 99.13% on the Deepship dataset.
Computer Vision and Pattern Recognition,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The main focus of this paper is on the problem of underwater audio target recognition (UATR). Existing datasets have limitations in terms of scale and diversity, which constrain the model's generalization ability. To address this issue, the paper proposes a large-scale, multi-label underwater audio dataset called "Oceanship," which includes 15 categories and has a total duration of 121 hours. Detailed annotation information such as coordinates, velocity, ship type, and timestamps is provided. The Oceanship dataset is crawled and organized from the original communication data in the ocean communication network database from 2021 to 2022. In the paper, the authors also present a baseline model called "Oceannet," which utilizes the LoRA tuning structure and spectrogram patch features for cross-modal learning, especially suitable for underwater audio retrieval tasks. Oceannet achieves an R@1 accuracy of 67.11% and an R@5 accuracy of 99.13% on the Deepship dataset. Furthermore, the paper redefines the supervised classification task and introduces zero-shot classification and retrieval tasks. Through the Oceanship dataset, the researchers demonstrate the superior performance of the model in zero-shot learning and retrieval tasks. Compared to existing datasets Deepship and ShipsEar, Oceanship exhibits significant improvements in terms of scale and diversity, making it more suitable for the generalization task of UATR. Future work will involve collecting larger UATR datasets and exploring few-shot learning methods on the test set.