Machine listening in a neonatal intensive care unit

Modan Tailleur,Vincent Lostanlen,Jean-Philippe Rivière,Pierre Aumond
2024-10-07
Abstract:Oxygenators, alarm devices, and footsteps are some of the most common sound sources in a hospital. Detecting them has scientific value for environmental psychology but comes with challenges of its own: namely, privacy preservation and limited labeled data. In this paper, we address these two challenges via a combination of edge computing and cloud computing. For privacy preservation, we have designed an acoustic sensor which computes third-octave spectrograms on the fly instead of recording audio waveforms. For sample-efficient machine learning, we have repurposed a pretrained audio neural network (PANN) via spectral transcoding and label space adaptation. A small-scale study in a neonatological intensive care unit (NICU) confirms that the time series of detected events align with another modality of measurement: i.e., electronic badges for parents and healthcare professionals. Hence, this paper demonstrates the feasibility of polyphonic machine listening in a hospital ward while guaranteeing privacy by design.
Sound,Artificial Intelligence,Machine Learning,Audio and Speech Processing
What problem does this paper attempt to address?
This paper attempts to solve two main problems faced in sound event detection and classification in the neonatal intensive care unit (NICU): **privacy protection** and **limited labeled data**. ### Specific problems: 1. **Privacy protection**: - Sound detection in the hospital environment involves sensitive information, such as patients' conversations, footsteps, etc. Directly recording the audio waveform may violate privacy. - To ensure privacy, researchers designed a special acoustic sensor that does not store the original audio waveform but calculates third - octave spectrograms. This representation method cannot recover understandable speech content. 2. **Limited labeled data**: - In the NICU environment, the labeled sound event data is very limited, which makes it difficult to apply traditional supervised learning methods. - Researchers bypass the problem of lack of labeled data by using a pre - trained audio neural network (PANN) combined with a spectral transcoder to convert the third - octave spectrogram into a more fine - grained mel - frequency spectrogram. ### Solutions: - **Combination of edge computing and cloud computing**: To ensure privacy and processing efficiency, researchers designed a hybrid computing pipeline, where part of the processing is completed on local devices (edge computing) and part on cloud - side servers. - **Spectral transcoder**: Used to convert the third - octave spectrogram into a mel - frequency spectrogram, so that the pre - trained audio neural network can recognize these sound events. - **Label space adaptation**: Adjust the output of PANN according to the types of sound events specific to NICU to adapt to the actual application scenarios. ### Experimental verification: - Researchers conducted experiments in a small - scale NICU. The results show that the time series of sound events detected by the machine auditory system is consistent with other measurement methods (such as the activities of medical staff and parents recorded by electronic badges), proving the feasibility of this method. In summary, this paper aims to achieve efficient detection and classification of multiple types of sound events in the NICU environment through innovative technical means while protecting privacy.