DNN-based Approach to Detect and Classify Pathological Voice.

Zong-Ying Chuang,Xiao-Tong Yu,Ji-Ying Chen,Yi-Te Hsu,Zhe-Zhuang Xu,Chi-Te Wang,Feng-Chuan Lin,Shih-Hau Fang
DOI: https://doi.org/10.1109/bigdata.2018.8622317
2018-01-01
Abstract:We participate in the FEMH 2018 Challenge of a bigdata subproject of the IEEE. The goal of this Challenge is pathological voice detection, and classify the different diseases, including phono trauma, neoplasm and vocal paralysis. Final, this challenge uses sensitivity, specificity and UAR as a result. The database is recorded with 50 normal voice samples and 150 samples of common voice disorders in a tertiary teaching hospital (Far Eastern Memorial Hospital, FEMH). The paper proposes a Deep Neural Networks based (DNN-based) approach in this challenge. Data preprocessing used Mel-Frequency Cepstral Coefficients (MFCCs), which also have emotion specific information. Gradual spectral variations are captured using 13 MFCCs extracted from speech signal. In the disease detection section, we examine the performance among different DNN structures (ie, hidden layers and number of neurons). Then, In the disease classification section, examine the performance among different batch sizes and normalize or no normalize. Finally, the tested DNN structures have the best results at 5 hidden layers and 200 of neurons.
What problem does this paper attempt to address?