Pathological Voice Classification Using Multiresolution Time Series Classification Network

Denghuang Zhao,Xincheng Zhu,Jinyang Qian,Xiaojun Zhang,Yishen Xu,Zhi Tao
DOI: https://doi.org/10.1109/icsmd57530.2022.10058311
2022-01-01
Abstract:The detection of pathological voices has achieved good results in recent years. However, due to the complexity of pathological voice, traditional feature based methods are not effective to further classify different voice disease types. In recent years, deep learning methods have shown excellent performance in deep feature extraction and classification of time series. In this paper, we propose a multiresolution time series classification network based on 1-D and 2-D dilated convolutional neural networks to perform the pathological voice multi-classification task. In our method, we used the combination of raw voice, glottal wave signal and the first order difference of glottal wave as the multivariate input of the network. The dilated convolutional layers with different dilation rates were designed to capture features from different scales of voice signals. We trained our network in the MEEI, SVD and HUPA databases and collected voices with a voice recorder to test the network's effect. An improvement of 17% in distinguishing healthy voices, neuromuscular disorders and structural disorders was obtained. The experimental result shows that the structure we proposed can significantly improve the performance of multi-classification task of voices.
What problem does this paper attempt to address?