Baby cry recognition based on SLGAN model data generation and deep feature fusion

Ke Zhang,Hua-Nong Ting,Yao-Mun Choo
DOI: https://doi.org/10.1016/j.eswa.2023.122681
IF: 8.5
2023-12-07
Expert Systems with Applications
Abstract:Deep learning models have been applied in baby cry recognition to enhance the recognition accuracy. However, the current research still suffers from data imbalance problem, which leads to bias in model learning. Sparse Autoencoder Long Short-Term Memory based Generative Adversarial Network (SLGAN) is proposed to solve the data imbalance problem. The proposed SLGAN model generates new baby cry data to ensure the number of samples for every cry class is equal. Speech features are extracted using Mel-spectrograms and Short-Time Fourier Transform (STFT). Two deep learning models, i.e. VGG16 and VGG19 are used to extract the deep features. The deep features are then dimensionally reduced by using Principal Component Analysis (PCA). A sparse autoencoder model is used to fuse the deep features. Finally, the fused features are trained and classified using the Deep Neural Network. The experimental results show that the proposed method outperforms the existing methods.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?