Abstract:Early diagnosis of medical conditions in infants is crucial for ensuring timely and effective treatment. However, infants are unable to verbalize their symptoms, making it difficult for healthcare professionals to accurately diagnose their conditions. Crying is often the only way for infants to communicate their needs and discomfort. In this paper, we propose a medical diagnostic system for interpreting infants' cry audio signals (CAS) using a combination of different audio domain features and deep learning (DL) algorithms. The proposed system utilizes a dataset of labeled audio signals from infants with specific pathologies. The dataset includes two infant pathologies with high mortality rates, neonatal respiratory distress syndrome (RDS), sepsis, and crying. The system employed the harmonic ratio (HR) as a prosodic feature, the Gammatone frequency cepstral coefficients (GFCCs) as a cepstral feature, and image-based features through the spectrogram which are extracted using a convolution neural network (CNN) pretrained model and fused with the other features to benefit multiple domains in improving the classification rate and the accuracy of the model. The different combination of the fused features is then fed into multiple machine learning algorithms including random forest (RF), support vector machine (SVM), and deep neural network (DNN) models. The evaluation of the system using the accuracy, precision, recall, F1-score, confusion matrix, and receiver operating characteristic (ROC) curve, showed promising results for the early diagnosis of medical conditions in infants based on the crying signals only, where the system achieved the highest accuracy of 97.50% using the combination of the spectrogram, HR, and GFCC through the deep learning process. The finding demonstrated the importance of fusing different audio features, especially the spectrogram, through the learning process rather than a simple concatenation and the use of deep learning algorithms in extracting sparsely represented features that can be used later on in the classification problem, which improves the separation between different infants' pathologies. The results outperformed the published benchmark paper by improving the classification problem to be multiclassification (RDS, sepsis, and healthy), investigating a new type of feature, which is the spectrogram, using a new feature fusion technique, which is fusion, through the learning process using the deep learning model.

Baby cry recognition based on SLGAN model data generation and deep feature fusion

Baby Cry Recognition by BCRNet Using Transfer Learning and Deep Feature Fusion

Inherent Emotional Feature Extraction of Neonatal Cry

Classification of Infant Cry Based on Hybrid Audio Features and ResLSTM

Baby cry recognition based on WOA-VMD and an improved Dempster-Shafer evidence theory

Using Transfer Learning, SVM, and Ensemble Classification to Classify Baby Cries Based on Their Spectrogram Images.

InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries

Deep Learning for Asphyxiated Infant Cry Classification Based on Acoustic Features and Weighted Prosodic Features

Infant Crying Detection in Real-World Environments

Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features

Convolutional Neural Networks for Audio-Based Continuous Infant Cry Monitoring at Home

Speech Emotion Recognition by Combining a Unified First-Order Attention Network with Data Balance

Ensemble of multimodal deep learning autoencoder for infant cry and pain detection

Machine learning-based infant crying interpretation

Classification of Infant Crying Sounds Using SE-ResNet-Transformer

A fully automated approach for baby cry signal segmentation and boundary detection of expiratory and inspiratory episodes

Weakly Supervised Detection of Baby Cry

A New Network Structure for Speech Emotion Recognition Research

A Deep Learning Method Using Gender-Specific Features for Emotion Recognition

An optimized automated recognition of infant sign language using enhanced convolution neural network and deep LSTM

InfantNet: A Deep Neural Network for Analyzing Infant Vocalizations