Pathological Voice Feature Generation Using Generative Adversarial Network

Qian Jinyang,Zhao Denghuang,Fan Ziqi,Wu Di,Xu Yishen,Tao Zhi
DOI: https://doi.org/10.1109/icsmd53520.2021.9670757
2021-01-01
Abstract:Due to the limitation of the establishment of pathological voice database, the number of samples is often insufficient and imbalanced, which causes the defect of many deep learning methods to be used in pathological voice database. In this paper, generative adversarial network (GAN) is used to generate the features of voice data to improve the imbalanced distribution of samples. GAN uses a generator and a discriminator for adversarial training to generate data similar to the original data distribution. Back propagation generative adversarial network (BPGAN) and deep convolution generative adversarial network (DCGAN) are used to generate vector features and matrix features respectively. It is found that the recall of minority sample is significantly higher when using balanced training set added data generated by GAN than imbalanced training set. Scatter diagram of generated feature vector and grayscale diagram of feature matrix are drawn. By adding the feature vector generated by BPGAN and DCCGAN, the recall of the test set for the minority sample is increased by 4.75%, and 21% respectively. The results show that GAN can generate different feature sets of pathological voice effectively.
What problem does this paper attempt to address?