ProGAN: Protein Solubility Generative Adversarial Nets for Data Augmentation in DNN Framework

Xi Han,Liheng Zhang,Kang Zhou,Xiaonan Wang
DOI: https://doi.org/10.1016/j.compchemeng.2019.106533
IF: 4.13
2019-01-01
Computers & Chemical Engineering
Abstract:Protein solubility plays a critical role in improving production yield of recombinant proteins in biocatalysis applications. To some extent, protein solubility can represent the function and activity of biocatalysts which are mainly composed of recombinant proteins. In literature, many machine learning models have been investigated to predict protein solubility from protein sequence, whereas parameters of those models were underdetermined with insufficient data of protein solubility. Here we propose a deep neural network (DNN) as a more accurate regression predictive model. Moreover, to tackle the insufficient data problem, a novel data augmentation algorithm, Protein Solubility Generative Adversarial Nets (ProGAN), was proposed for improving the prediction of protein solubility. After adding mimic data produced from ProGAN, the prediction performance measured by R-2 was improved compared with that without data augmentation. A R-2 value of 0.4504 was achieved, which was enhanced about 10% compared with the previous study using the same dataset. (C) 2019 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?