InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries

Mengze Hong,Chen Jason Zhang,Lingxiao Yang,Yuanfeng Song,Di Jiang
2024-09-29
Abstract:Understanding the meaning of infant cries is a significant challenge for young parents in caring for their newborns. The presence of background noise and the lack of labeled data present practical challenges in developing systems that can detect crying and analyze its underlying reasons. In this paper, we present a novel data-driven framework, "InfantCryNet," for accomplishing these tasks. To address the issue of data scarcity, we employ pre-trained audio models to incorporate prior knowledge into our model. We propose the use of statistical pooling and multi-head attention pooling techniques to extract features more effectively. Additionally, knowledge distillation and model quantization are applied to enhance model efficiency and reduce the model size, better supporting industrial deployment in mobile devices. Experiments on real-life datasets demonstrate the superior performance of the proposed framework, outperforming state-of-the-art baselines by 4.4% in classification accuracy. The model compression effectively reduces the model size by 7% without compromising performance and by up to 28% with only an 8% decrease in accuracy, offering practical insights for model selection and system design.
Sound,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
The paper aims to address the issue of newborn cry detection and analysis to help new parents better understand their baby's needs and support the overall health of the infant. Specifically, the paper proposes a data-driven framework called "InfantCryNet" for identifying and analyzing infant cries. It mainly addresses the following issues: 1. **Background Noise Issue**: Various background noises in real environments make it difficult to effectively detect infant cries. 2. **Data Scarcity Issue**: Labeled infant cry data is very scarce, posing challenges for model training. 3. **Diverse Cry Patterns**: The diversity of infant cries makes it more complex to accurately determine the reasons behind the cries. 4. **Computational Resource Constraints**: To enable deployment on mobile devices (such as smartphones and tablets), it is necessary to ensure that the model has low computational requirements. To address these issues, the paper employs pre-trained audio models to introduce prior knowledge and proposes a feature extraction method that combines statistical pooling and multi-head attention pooling. Additionally, knowledge distillation and model quantization techniques are applied to improve model efficiency and reduce model size, thereby better supporting deployment on mobile devices in the industry. Experimental results show that the proposed framework improves classification accuracy by 4.4% compared to existing baseline methods, and model compression techniques can effectively reduce model size without significantly affecting performance.