Identification of compound–protein interactions through the analysis of gene ontology, KEGG enrichment for proteins and molecular fragments of compounds

Lei Chen,Yu-Hang Zhang,Mingyue Zheng,Tao Huang,Yu-Dong Cai
DOI: https://doi.org/10.1007/s00438-016-1240-x
IF: 2.98
2016-08-16
Molecular Genetics and Genomics
Abstract:Compound–protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound–protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound–protein interactions. Compound–protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound–protein interactions. This work provides new clues to understanding the system of compound–protein interactions by analyzing extracted core features. Our results indicate that compound–protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.
genetics & heredity,biochemistry & molecular biology
What problem does this paper attempt to address?