Deep CNN for Parkinson's Disease Classification Using Line Spectral Frequency Images of Sustained Speech Phonation
Rani Kumari Prakash Ramachandran School of Electronics Engineering,Vellore Institute of Technology,Vellore,TN,632014,IndiaRani Kumari completed her Btech degree in electronics and communication engineering from AEC,W.B.,India,in 2013,and her Mtech degree in signal processing from BITM,W.B.,India,in 2017. She is currently a research scholar in the school of electronics engineering department,VIT Vellore,Tamil Nadu,India. Her research area includes signal processing. Email: akash Ramachandran completed his BE degree in electronics and communication engineering from GCT,Coimbatore,India,in 1996,and an ME degree in embedded systems from CEG,Anna University,India,in 2006,and his PhD in compressive sensing from VIT,Chennai by 2019. He is currently working as an associate professor with the School of Electronics Engineering,VIT,Vellore,Tamil Nadu,India. He is the author of many research articles,book chapters,and conference papers. His research interests include machine learning,speech signal processing,artificial Intelligence,and Internet of Things (IoT). Corresponding author: Email: prakash.r@vit.ac.in
DOI: https://doi.org/10.1080/03772063.2024.2409677
IF: 1.8768
2024-10-10
IETE Journal of Research
Abstract:It is proposed to use Deep Convolution Neural Network (DCNN) which is a good classifier of natural images to learn speech spectrum images of sustained phonation to detect Parkinson's Disease (PD) as an alternative to the existing feature-based machine learning method. It is shown that the proposed method yields very high accuracy without the need for separate feature computation stage. The speech spectrum representations proposed are Short Time Fourier Transform (STFT) spectrum of size Nx256 and Line Spectral Frequency (LSF) spectrum of size Nx16. LSF reflects the speech production mechanism and it is a novel idea to use LSF spectrum in DCNN to detect PD speech. The spectrum images look like random patterns and the performance is improved when using an additional deeper hidden layer of tampering pattern in the last stage of a fully connected layer. Using a standard PD-sustained phonation dataset the training accuracies achieved are 98.50% and 92.50% for STFT and LSF method, respectively. The validation accuracies achieved are 84.38% for STFT and 100% for LSF. The STFT method results in a sensitivity of 97.05%, a specificity of 88.63%, a precision of 86.84%, an F1-score of 91.66, a false positive rate (FPR) of 11.36%, and a false alarm rate of 12.82%. The LSF method results in a sensitivity 97.05%, a specificity of 95.45%, a precision of 94.28%, an F1-score of 95.65, an FPR of 4.50%, and a false alarm rate of 5.71%. The LSF based method performs better and the performance comparison with the state-of-the-art methods brings out the merits of the LSF spectrum image-based DCNN learning in PD detection using sustained phonation.
telecommunications,engineering, electrical & electronic