A Novel Cross-Attention Fusion-Based Joint Training Framework for Robust Underwater Acoustic Signal Recognition
Aolong Zhou,Xiaoyong Li,Wen Zhang,Dawei Li,Kefeng Deng,Kaijun Ren,Junqiang Song
DOI: https://doi.org/10.1109/tgrs.2023.3333971
IF: 8.2
2023-11-29
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Underwater acoustic signal recognition (UASR) systems face challenges in achieving high accuracy when processing complex data with low signal-to-noise ratio (SNR) in underwater environments, leading to limited noise robustness. Conventional approaches typically employ pre-trained denoising models for preprocessing noisy signals. However, due to disparate optimization goals between denoising and recognition models, denoising methods might introduce signal distortion, hampering effective enhancement of system accuracy. To address this issue, this article proposes a novel joint training framework with cross-attention fusion for robust UASR, called CAF-JT. CAF-JT consists of a denoising module, a recognition module, and the CAF module. It addresses the mismatch problem arising from different optimization directions by jointly training the denoising frontend and the recognition backend. Additionally, inspired by the multicondition training (MCT) method, the CAF module is designed to fuse characteristics from both denoised and noisy audio, thus incorporating noise information. This fusion mechanism enables the model to better adapt to the characteristics of the noisy environment and enhance its noise robustness. Furthermore, to improve the performance of UASR, time-frequency transformer (TF-transformer) blocks are incorporated into both the denoising module and the recognition module to capture the spatio-temporal distribution of spectral features. The proposed approach is evaluated on two open-source underwater acoustic signal datasets, namely ShipsEar and DeepShip. Extensive experimental demonstrate the superiority of CAF-JT over conventional joint training approaches, showcasing its improved noise robustness. Particularly in low SNR conditions, CAF-JT achieves the best average recognition rates of 94.84% and 93.61% on the two datasets, respectively.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics