A logistic regression based algorithm for identifying human disease genes

Bolin Chen,Min Li,Jianxin Wang,Fang-Xiang Wu
DOI: https://doi.org/10.1109/BIBM.2014.6999153
2014-01-01
Abstract:The identification of disease genes is the first step towards the understanding of genetic disease mechanisms. Although many computational algorithms are proposed to identify disease genes, they either have poor performance in terms of AUC scores or are very time consuming. To overcome these two problems, a logistic regression based algorithm is proposed in this study for identifying disease genes. The issue of disease gene identification is formulated as a two-class classification problem, where one class represents those disease genes, while the other class represents non-disease genes. A binary logistic regression is employed to predict the posterior probability of a gene associated with disease by taking prior labels as the categorical dependent variables and label related feature vectors as predictor variables. Numerical experiments show that the proposed logistic regression based algorithm not only have a very good performance, but also significantly reduce the computing time. The AUC score is 0.737 when no prior information is used and it increases to 0.766 when protein complex data are integrated. Averagely, the proposed algorithm only takes 1.31% and 37.35% running time of the existing MRF method and RWR algorithm, respectively, when generating one prediction in the leave-one-out cross validation method.
What problem does this paper attempt to address?