More Modalities Mean Better: Vessel Target Recognition and Localization Through Symbiotic Transformer and Multiview Regression

Shipei Liu,Xiaoya Fan,Guowei Wu,Lin Yao,Shisong Geng
DOI: https://doi.org/10.1109/tgrs.2024.3365711
IF: 8.2
2024-03-09
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Vessel target recognition and localization are typically modeled using underwater acoustic (UA) signals, which contain a large amount of vessel operating characteristics and condition information. However, extracting operating characteristics from single signals faces heavy noise and non-stationarity challenges. Meanwhile, feature extraction using multimodal data faces the challenges of conflicting gradients between different modalities and ensuring the separability of vessel targets. To tackle these issues, we propose an audio-visual-textual features fusion method to recognize and localize vessel targets through symbiotic transformer (Symb-Trans) and multiview regression (MVR) models. Specifically, the audio-visual samples are first preprocessed into paired time series and then projected into a unified optimization landscape via a heterogeneous batch normalization (HetBN) layer to avoid gradient conflicts. Second, the Symb-Trans trains parallel encoders with cross-modal attention (CMA) and embeds audio-visual representations for vessel target recognition. Finally, the MVR method learns neighboring target properties of a graph model from different perspectives, audio-visual-textual representations, to infer the collector-target distance. Since no off-the-shell multimodal dataset is available for vessel targets, we combine multiple public datasets, consisting of acoustic, and/or visual, and/or textural data, to obtain multimodal materials for model training and validation. Through experimental results and theoretical analysis, we show that Symb-Trans and MVR models outperform unimodal and generic multimodal state-of-the-art solutions for vessel target recognition and localization.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?