A Probabilistic Model for Mining Implicit 'chemical Compound-Gene' Relations from Literature

SF Zhu,Y Okuno,G Tsujimoto,H Mamitsuka
DOI: https://doi.org/10.1093/bioinformatics/bti1141
IF: 5.8
2005-01-01
Bioinformatics
Abstract:MOTIVATIONThe importance of chemical compounds has been emphasized more in molecular biology, and 'chemical genomics' has attracted a great deal of attention in recent years. Thus an important issue in current molecular biology is to identify biological-related chemical compounds (more specifically, drugs) and genes. Co-occurrence of biological entities in the literature is a simple, comprehensive and popular technique to find the association of these entities. Our focus is to mine implicit 'chemical compound and gene' relations from the co-occurrence in the literature.RESULTSWe propose a probabilistic model, called the mixture aspect model (MAM), and an algorithm for estimating its parameters to efficiently handle different types of co-occurrence datasets at once. We examined the performance of our approach not only by a cross-validation using the data generated from the MEDLINE records but also by a test using an independent human-curated dataset of the relationships between chemical compounds and genes in the ChEBI database. We performed experimentation on three different types of co-occurrence datasets (i.e. compound-gene, gene-gene and compound-compound co-occurrences) in both cases. Experimental results have shown that MAM trained by all datasets outperformed any simple model trained by other combinations of datasets with the difference being statistically significant in all cases. In particular, we found that incorporating compound-compound co-occurrences is the most effective in improving the predictive performance. We finally computed the likelihoods of all unknown compound-gene (more specifically, drug-gene) pairs using our approach and selected the top 20 pairs according to the likelihoods. We validated them from biological, medical and pharmaceutical viewpoints.
What problem does this paper attempt to address?