Integrating Multiple Gene Semantic Similarity Profiles to Infer Disease Genes

Huan Peng,Rui Jiang
2012-01-01
Abstract:The inference of genes that are associated with human inherited diseases (disease genes) has been a task of great challenging in biological and medical studies. Many computational methods have been proposed to prioritize candidate genes with the use of a variety of genomic information. In this work, we propose a novel perspective of binary classification for the inference of disease genes. We integrate three semantic similarity profiles of human genes, a phenotype similarity profile of human diseases, and known associations between diseases and genes to obtain three numerical features that indicate the relevance between a given disease-gene pair. With the features, we use three classification methods (the logistic regression, the random forest, and the support vector machine) to predict whether a gene is truly associated with a disease or not. We apply 10-fold cross-validation experiments to assess the performance of the proposed method and show the effectiveness of this approach. We further show that this binary classification formulation can also be used to address the problem of prioritizing candidate genes.
What problem does this paper attempt to address?