Gene-CWGAN: a data enhancement method for gene expression profile based on improved CWGAN-GP
Fei Han,Shaojun Zhu,Qinghua Ling,Henry Han,Hailong Li,Xinli Guo,Jiechuan Cao,Han, Fei,Zhu, Shaojun,Ling, Qinghua,Han, Henry,Li, Hailong,Guo, Xinli,Cao, Jiechuan
DOI: https://doi.org/10.1007/s00521-022-07417-9
2022-06-08
Neural Computing and Applications
Abstract:Traditional machine learning methods are difficult to obtain good performance in the classification of gene expression data due to its characteristics of high dimension and small sample size. As a data enhancement technology, conditional Wasserstein generative adversarial network based on gradient punishment (CWGAN-GP) has strong universality and can generate high-quality samples of specified labels, which has been proved to improve the performance of classification models. However, the samples generated by CWGAN-GP have low sample diversity and distribution uncertainty on gene expression data, which may decrease the classification accuracy of classifiers. Therefore, a data enhancement method for gene expression data based on CWGAN-GP (Gene-CWGAN) is proposed in this study. First, to stabilize the distribution of generated samples, a dataset partition method based on sample dispersion is adopted in Gene-CWGAN to make the distribution of training samples as close as possible to the real sample distribution. Subsequently, the space of the generated samples is redefined and a constraint penalty term is adopted to eliminate the restriction of the originally generated space. Finally, in order to overcome the problem of network volatility on the quality of generated samples, a Gene-CWGAN based on a proxy model (Gene-CWGAN-PS) is proposed to ensure the sample quality. Experimental results on five public gene expression data verify that the Gene-CWGAN outperforms other involved methods in terms of diversity, distribution stability, and quality of generated samples.
computer science, artificial intelligence