Mining Disease-Specific Molecular Association Profiles from Biomedical Literature: A Case Study

Jiao Li,Xiaoyan Zhu,Jake Yue Chen
DOI: https://doi.org/10.1145/1363686.1363984
2008-01-01
Abstract:We developed a new literature mining paradigm with the ultimate goal of enabling knowledge discovery in molecular association profiles generated from literature and prior knowledge. We show how to implement the paradigm by building a prototype literature mining framework and performing molecule-bioGist association mining. The framework consists of two modules. The first module, Textual Data Mining , takes the synonym-expanded disease-related molecule names and outputs a list of bioGist list. The second module, Structured Data Mining , takes two inputs, initial disease-related molecular query terms and extracted bioGist list from the first module, and outputs a molecule-bioGist association matrix . Our approach is novel because biomedical literature mining is used here not only as an "information retrieval" tool, but also as a "hypothesis generation and validation" platform. We applied the framework to a molecular pharmacology study of breast cancer. Based on 214 breast cancer-related proteins, 429,067 MEDLINE abstracts were retrieved, and 4,491 drug compounds were identified as bioGists. We evaluated 172 hydrocarbons in the above bioGist list, and found that more than 82.5% hydrocarbons were verified to be related to breast cancer. BRCA1 and BRCA2 were found to have similar profiles in drug compound studies, whereas "doxorubicin", "etoposide", and "paclitaxel" were identified to have similar pharmacological profiles to treat breast cancer.
What problem does this paper attempt to address?