Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson's Disease: A Study on Speaker Diarization and Classification Techniques

Michele Giuseppe Di Cesare,David Perpetuini,Daniela Cardone,Arcangelo Merla
DOI: https://doi.org/10.3390/s24051499
IF: 3.9
2024-02-27
Sensors
Abstract:Parkinson's disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal and vocal tract musculature. From this perspective, the integration of machine learning (ML) techniques in the analysis of speech signals has significantly contributed to the detection and diagnosis of PD. Particularly, MEL Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) are both feature extraction techniques commonly used in the field of speech and audio signal processing that could exhibit great potential for vocal disorder identification. This study presents a novel approach to the early detection of PD through ML applied to speech analysis, leveraging both MFCCs and GTCCs. The recordings contained in the Mobile Device Voice Recordings at King's College London (MDVR-KCL) dataset were used. These recordings were collected from healthy individuals and PD patients while they read a passage and during a spontaneous conversation on the phone. Particularly, the speech data regarding the spontaneous dialogue task were processed through speaker diarization, a technique that partitions an audio stream into homogeneous segments according to speaker identity. The ML applied to MFCCS and GTCCs allowed us to classify PD patients with a test accuracy of 92.3%. This research further demonstrates the potential to employ mobile phones as a non-invasive, cost-effective tool for the early detection of PD, significantly improving patient prognosis and quality of life.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? The main goal of this paper is to propose a speech analysis method based on machine learning (ML) techniques for the early detection of Parkinson's Disease (PD). Specifically, the study utilizes Mel-Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) as feature extraction techniques, combined with speaker diarization algorithms to process speech signals. #### Research Background: - **Parkinson's Disease Symptoms**: Parkinson's Disease is a neurodegenerative disorder characterized by motor symptoms (such as tremors, bradykinesia, rigidity, and postural instability) and non-motor symptoms (such as speech disorders). Speech disorders are particularly prominent in PD patients, affecting their communication abilities. - **Limitations of Existing Diagnostic Methods**: Currently, the diagnosis of Parkinson's Disease primarily relies on the MDS-Unified Parkinson's Disease Rating Scale (MDS-UPDRS), which involves visual assessment of tremor severity and is subject to a certain degree of subjectivity and error. #### Research Methods: - **Dataset**: The study used the Mobile Device Voice Recordings at King’s College London (MDVR-KCL) dataset, which includes 37 recording samples, with 21 from healthy controls (HC) and 16 from Parkinson's Disease patients. - **Task Setup**: Participants were asked to complete two tasks: one was to read a passage of text; the other was to engage in spontaneous conversation. - **Feature Extraction and Classification**: The study utilized MFCCs and GTCCs as feature extraction methods, combined with a speaker diarization algorithm to segment the audio into different speaker segments. Subsequently, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and neural network models were used for classification. #### Main Findings: - **Experimental Results**: In the reading task, the method combining MFCCs and GTCCs achieved an accuracy of 92.3% on the test set. - **Importance of Speaker Diarization**: In the spontaneous conversation task, using the speaker diarization algorithm effectively segmented the speech segments of different speakers, thereby improving classification accuracy. ### Conclusion This study demonstrates the potential for non-invasive, cost-effective early detection of Parkinson's Disease using smartphones, significantly improving patient prognosis and quality of life. By combining MFCCs and GTCCs with speaker diarization algorithms, the study achieved good classification performance in speech signal processing.