Estimation of Speech Features Using a Wearable Inertial Sensor

Zuyu Du,Yaodan Xu,Xinsheng Yu,Sen Wang,Lin Xu
DOI: https://doi.org/10.1016/j.jvoice.2024.09.012
IF: 2.3
2024-10-12
Journal of Voice
Abstract:Summary Speech features have been investigated as novel digital biomarkers for many psychiatric and neurocognitive diseases. Microphones are the most used devices for speech recording but inevitably suffering from several disadvantages such as privacy leakage and environmental noises, limiting their clinical applications particularly for long-term ambulatory monitoring. The aim of the present study is therefore to explore the feasibility of extracting speech features from the acceleration recorded on the sternum. Ten healthy subjects volunteered in our study. Two speech tasks, that is, repeating one sentence 20 times and reading 20 different sentences, were performed by each subject, with each task repeated eight times under different speech rate and loudness. Voice signals and speech-caused chest vibrations were simultaneously recorded by a microphone and an accelerometer placed on the sternum. Forty-two acoustic features and six time-related prosodic features were extracted from both signals using a standard toolbox, and then compared by a linear fit and correlation analysis. Good agreement between the acceleration features and microphone features is observed in all six time-related prosodic features for both tasks, but only in 19 and 17 acoustic features for task 1 and 2, respectively, with most of them loudness- or pitch-related. Our results suggest the sternum acceleration to track time-related speech prosody, loudness, and pitch very well, demonstrating the feasibility of deriving digital biomarkers from the acceleration signal for diseases strongly related to time-related prosodic and loudness features.
otorhinolaryngology,audiology & speech-language pathology
What problem does this paper attempt to address?