Investigation of Time-Frequency Feature Combinations with Histogram Layer Time Delay Neural Networks

Amirmohammad Mohammadi,Iren'e Masabarakiza,Ethan Barnes,Davelle Carreiro,Alexandra Van Dine,Joshua Peeples
2024-09-21
Abstract:While deep learning has reduced the prevalence of manual feature extraction, transformation of data via feature engineering remains essential for improving model performance, particularly for underwater acoustic signals. The methods by which audio signals are converted into time-frequency representations and the subsequent handling of these spectrograms can significantly impact performance. This work demonstrates the performance impact of using different combinations of time-frequency features in a histogram layer time delay neural network. An optimal set of features is identified with results indicating that specific feature combinations outperform single data features.
Sound,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to improve the performance of underwater acoustic signal classification through the combination of different time - frequency features. Specifically, the authors explored different combinations of multiple time - frequency features in the Histogram Layer Time Delay Neural Network (HLTDNN) to find an optimal feature set, thereby enhancing the classification effect of the Underwater Acoustic Target Recognition (UATR) task. ### Problem Background 1. **Deep Learning and Feature Engineering** - Although deep learning reduces the need for manual feature extraction, in some cases, especially for underwater acoustic signals, feature engineering is still a crucial step in improving model performance. - Audio signals are usually first converted into time - frequency representations (such as spectrograms) and then processed by artificial neural networks. The quality and processing method of these spectrograms have a significant impact on model performance. 2. **Importance of Underwater Acoustic Classification** - Underwater acoustic classification techniques have a wide range of applications in the marine environment, such as biological behavior pattern analysis, search and rescue, seabed mapping, and ship traffic monitoring. - Existing research shows that different feature combinations can significantly affect classification performance, but there has not been a systematic study on feature combination optimization for the HLTDNN model. ### Research Objectives 1. **Explore the Effects of Different Feature Combinations** - Verify the impact of different time - frequency feature combinations on the performance of the HLTDNN model through experiments. - Find the optimal feature combination to improve the classification accuracy of the UATR task. 2. **Introduce New Feature Processing Methods** - Use an adaptive padding layer to enable spectrograms of different sizes to be uniformly input into the model, avoiding information loss. - Capture statistical features in spectrograms through the histogram layer to enhance the model's ability to represent feature distributions. ### Main Contributions 1. **First Study on Feature Combinations on the DeepShip Dataset** - This study is the first to use the HLTDNN model for feature combination research on the DeepShip dataset, filling this gap in the field. 2. **Discover the Optimal Feature Combination** - The experimental results show that the combination of VQT, MFCC, STFT, and GFCC performs best, improving the classification accuracy by approximately 6.83% compared to a single feature (such as MFCC). 3. **Explain the Model Decision - making Process** - Use Explainable AI (XAI) methods, such as FullGrad Class Activation Mapping (FullCAM), to show the specific frequency bands that the model focuses on during the classification process, further verifying the effectiveness of the feature combination. In summary, this paper aims to improve the performance of the HLTDNN model in the underwater acoustic target recognition task through a systematic study of different time - frequency feature combinations, and provides valuable references for future feature selection and model optimization.