DeEPsnap: human essential gene prediction by integrating multi-omics data

Xue Zhang,Weijia Xiao,Brent H Cochran,Wangxin Xiao
DOI: https://doi.org/10.1101/2024.06.20.599958
2024-06-22
Abstract:Essential genes are necessary for the survival or reproduction of a living organism. The prediction and analysis of gene essentiality can advance our understanding of basic life and human diseases, and further boost the development of new drugs. Wet lab methods for identifying cell essential genes are often costly, time-consuming, and laborious. As a complement, computational methods have been proposed to predict essential genes by integrating multiple biological data sources. Most of these methods are evaluated on model organisms. However, prediction methods for human essential genes are still limited and the relationship between human gene essentiality and different biological information still needs to be explored. In addition, exploring suitable deep learning techniques to overcome the limitations of traditional machine learning methods and improve prediction accuracy is also important and interesting. We propose a snapshot ensemble deep neural network method, DeEPsnap, to predict human essential genes. DeEPsnap integrates sequence features derived from DNA and protein sequence data with features extracted or learned from multiple types of functional data, such as gene ontology, protein complex, protein domain, and protein-protein interaction network. More than 200 features from these biological data are extracted/learned which are integrated together to train a series of cost-sensitive deep neural networks by utilizing multiple deep learning techniques. The proposed snapshot mechanism enables us to train multiple models without increasing extra training effort and cost. The experimental results of 10-fold cross-validation show that DeEPsnap can accurately predict human gene essentiality with an average AUROC (Area Under the Receiver Operating Characteristic curve) of 96.1%, the average AUPRC (Area under the Precision-Recall curve) of 93.82%, the average accuracy of 92.21%, and the average F1 measure about 80.62%. In addition, the comparison of experimental results shows that DeEPsnap outperforms several popular traditional machine learning models and deep learning models. We have demonstrated that the proposed method, DeEPsnap, is effective for predicting human essential genes.
Bioinformatics
What problem does this paper attempt to address?