Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning

Sheng Feng,Xiaoqian Zhu,Shuqing Ma
DOI: https://doi.org/10.1109/taslp.2024.3358719
2024-01-01
Abstract:Deep learning has made data-driven methods effective in underwater acoustic target recognition (UATR) using passive sonar signals. However, a major current challenge is the limited availability of underwater acoustic data, leading to suboptimal performance without sufficient data. Self-supervised learning (SSL) can help address this problem by learning intrinsic patterns within acoustic data. Nonetheless, applying SSL in UATR systems requires efficient learning of meaningful representations that can provide quick prediction speed for real-time recognition systems. To this end, we propose the masking hierarchical tokens (MHT) method to learn meaningful representations via efficient self-supervised learning for our previously proposed UATR-Transformer, giving rise to the MHT-UATR-Transformer. In particular, the MHT-UATR-Transformer first exploits a new designed token-convolution-based hierarchical tokenization to efficiently obtain rich timefrequency information from the input Mel-spectrogram. Then, most of these tokens are masked with a high masking ratio and subsequently reconstructed by an integrated EncoderDecoder structure. In this way, the MHT-UATR-Transformer can learn intrinsic representations of underwater acoustic signals to achieve better recognition performance with fewer labeled data, which is expected to alleviate the dependency on expensive acoustic data. Experimental results on two widely studied underwater databases show that our proposed method achieves better performance than supervised learning and state-of-the-art SSL method in both accuracy and speed, especially in few-shot and noisy scenarios, thus enhancing its practicality in real marine applications.
engineering, electrical & electronic,acoustics
What problem does this paper attempt to address?