Document Triage and Relation Extraction for Protein-Protein Interactions affected by Mutations

Qingyu Chen,Nagesh C. Panyam,Aparna Elangovan,Melissa J. Davis,Karin M. Verspoor
Abstract:We describe the University of Melbourne READBiomed team’s participation in the Document Triage and Relation Extraction tasks of the Precision Medicine track of BioCreative VI. For the Document Triage task, we create term lists consisting of terms that are used to describe interactions, mutations, and expected effects on interactions mutations may have. We apply them along with a range of standard bag-of-word features to capture close to 30 features for building classification models. The best model provides a roughly 10% (absolute) increase in F1-score as compared to baseline results, based on 10fold cross-validation in the training data. For the relation extraction task we use GNormPlus to recognize and normalize gene names. We use two methods, a co-occurrence based method and Support vector machine (SVM) for relation extraction. They achieve 27.4% and 27.2% F1 scores respectively, based on 5-fold cross-validation over the training data. Availability—The codes are available from https://biodbqual@bitbucket.org/readbiomed/biocreative-vi.git Keywords—Document triage; Relation extraction;
What problem does this paper attempt to address?