A Novel Score-CAM based Denoiser for Spectrographic Signature Extraction without Ground Truth

Noel Elias
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191897
2024-10-30
Abstract:Sonar based audio classification techniques are a growing area of research in the field of underwater acoustics. Usually, underwater noise picked up by passive sonar transducers contains all types of signals that travel through the ocean and is transformed into spectrographic images. As a result, the corresponding spectrograms intended to display the temporal-frequency data of a certain object often include the tonal regions of abundant extraneous noise that can effectively interfere with a 'contact'. So, a majority of spectrographic samples extracted from underwater audio signals are rendered unusable due to their clutter and lack the required indistinguishability between different objects. With limited clean true data for supervised training, creating classification models for these audio signals is severely bottlenecked. This paper derives several new techniques to combat this problem by developing a novel Score-CAM based denoiser to extract an object's signature from noisy spectrographic data without being given any ground truth data. In particular, this paper proposes a novel generative adversarial network architecture for learning and producing spectrographic training data in similar distributions to low-feature spectrogram inputs. In addition, this paper also a generalizable class activation mapping based denoiser for different distributions of acoustic data, even real-world data distributions. Utilizing these novel architectures and proposed denoising techniques, these experiments demonstrate state-of-the-art noise reduction accuracy and improved classification accuracy than current audio classification standards. As such, this approach has applications not only to audio data but for countless data distributions used all around the world for machine learning.
Sound,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to extract the signature of a specific object from a noisy spectrogram without ground - truth data. Specifically, the audio signals received by passive sonar in the field of underwater acoustics usually contain a large amount of interfering noise, making it impossible to effectively distinguish the features of different objects in the generated spectrogram. Due to the lack of clean ground - truth data for supervised training, it becomes very difficult to create an effective audio - signal classification model. ### Problem Background 1. **Noise Problem in Underwater Acoustics**: - In underwater acoustics research, the audio signals received by passive sonar contain not only the sounds of target objects but also various noises from other human activities, marine animals, echoes, etc. - These noises lead to a large amount of useless information in the generated spectrogram, which seriously affects the identification and classification of the features of target objects. 2. **Lack of Clean Ground - Truth Data**: - In order to train machine - learning models such as efficient convolutional neural networks (CNNs), a large amount of clean and correctly labeled data is required. - However, in practical applications, it is very difficult to obtain these high - quality ground - truth data, especially for underwater audio signals. 3. **Limitations of Existing Methods**: - Most of the current methods rely on supervised learning and require a large number of clean spectrograms as training data. - When there is a lack of ground - truth - labeled data, the existing denoising methods are not effective and it is difficult to automatically extract and identify the feature regions of target objects. ### Method Proposed in the Paper To solve the above problems, the paper proposes a new type of denoiser based on Score - CAM, which can extract the signature of the target object from a noisy spectrogram without ground - truth data. The specific methods include: 1. **Generating Additional Data by Generative Adversarial Network (GAN)**: - Use GAN to generate new spectrograms with a similar distribution to the original data to increase the diversity of the training data. - GAN generates realistic spectrograms through the adversarial training of the generator and the discriminator, thereby improving the generalization ability of the model. 2. **Denoiser Based on Score - CAM**: - Use Score - CAM technology to generate class activation maps (CAMs) to determine the regions in the spectrogram related to the target class. - By superimposing the class activation maps and the input spectrogram, extract the feature regions of the target object and remove noise interference. 3. **Image Clustering and Mask Generation**: - Use the K - Means++ algorithm to cluster spectrograms and extract the most representative samples. - Generate a general mask according to the clustering results and combine it with the specific mask of each input image to finally extract the signature of the target object. ### Results The experimental results show that this method is superior to the existing mainstream methods in terms of denoising and classification accuracy, especially in the case of lacking ground - truth data. The specific results include: - **Data Generation**: The new data generated by WGAN can significantly improve the diversity and quality of the training set. - **Image Clustering**: The KMeans++ algorithm successfully extracts the most representative spectrogram samples. - **Mask Generation and Signature Extraction**: The masks generated by Score - CAM can effectively remove noise and extract the feature regions of the target object. In conclusion, the Score - CAM - based denoiser proposed in this paper provides a new solution for audio - signal processing in the field of underwater acoustics, especially for the case of lacking ground - truth data.