Protein-Protein Interaction Extraction: A Supervised Learning Approach}

Juan Xiao,Jian Su,Guodong Zhou,Chew-Lim Tan
2005-01-01
Abstract:In this paper, we propose using Maximum Entropy to extract protein-protein interaction information from the literature, which overcomes the limitation of the state of art co-occurrence based and rule-based approaches. It incorporates corpus statistics of various lexical, syntactic and semantic features. We find that the use of shallow lexical features contributes a large portion of performance improvements in contrast to the use of parsing or partial parsing information. Yet such lexical features have never been used before in other PPI extraction systems. As a result, such a new approach achieves a very encouraging result of 93.9% recall and 88.0% precision on IEPA corpus provided. To the best of our knowledge, not only is this the first systematic study of supervised learning and the first attempt of feature-based supervised learning for PPI extraction, but it also provides useful features, such as surrounding words, key words and abbreviations, to extend the supervised learning capability for relation extraction to other domains such as news.
What problem does this paper attempt to address?