Abstract:Lung or heart sound classification is challenging due to the complex nature of audio data, its dynamic properties of time, and frequency domains. It is also very difficult to detect lung or heart conditions with small amounts of data or unbalanced and high noise in data. Furthermore, the quality of data is a considerable pitfall for improving the performance of deep learning. In this paper, we propose a novel feature-based fusion network called FDC-FS for classifying heart and lung sounds. The FDC-FS framework aims to effectively transfer learning from three different deep neural network models built from audio datasets. The innovation of the proposed transfer learning relies on the transformation from audio data to image vectors and from three specific models to one fused model that would be more suitable for deep learning. We used two publicly available datasets for this study, i.e., lung sound data from ICHBI 2017 challenge and heart challenge data. We applied data augmentation techniques, such as noise distortion, pitch shift, and time stretching, dealing with some data issues in these datasets. Importantly, we extracted three unique features from the audio samples, i.e., Spectrogram, MFCC, and Chromagram. Finally, we built a fusion of three optimal convolutional neural network models by feeding the image feature vectors transformed from audio features. We confirmed the superiority of the proposed fusion model compared to the state-of-the-art works. The highest accuracy we achieved with FDC-FS is 99.1% with Spectrogram-based lung sound classification while 97% for Spectrogram and Chromagram based heart sound classification.

Time–Frequency Feature Fusion for Noise Robust Audio Event Classification

Robust sound event classification using deep neural networks

Audio-Visual Speech Enhancement with Deep Multi-modality Fusion

MFCC combined with sparse coding for sound event classification under different noise environments

Adaptive DCTNet for Audio Signal Classification

Robust Sound Event Classification by Using Denoising Autoencoder

Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks

Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks

Robust Audio Sensing with Multi-Sound Classification.

Using Deep Belief Network to Capture Temporal Information for Audio Event Classification.

Temporal Coding of Local Spectrogram Features for Robust Sound Recognition

Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection.

Hierarchical-Concatenate Fusion TDNN for sound event classification

A time-frequency fusion model for multi-channel speech enhancement

Infrasound Event Classification Fusion Model Based on Multiscale SE-CNN and BiLSTM

End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input

Feature-Based Fusion Using CNN for Lung and Heart Sound Classification

Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network

Deep Learning Approach to Classification of Acoustic Signals Using Information Features