Protein-protein Interaction Extraction Based on Self-Training

Xiaoyan Zhu
2012-01-01
Abstract:Protein-protein interaction(PPI) extraction based on supervised machine learning still faces the problem that a limited labeled dataset does not saturate the learning method.This study first presents a rich feature vector,including syntactic features,lexical features,and part-of-speech(POS) tag features and then analyzes the inconsistencies of data distributions between different datasets,with a data enlargement algorithm based on self-training.This algorithm chooses the most confident instances from the unlabeled dataset into the target dataset to improve the learning method.Tests show that the PPI extraction approach considerably improves the extraction efficiency on five public PPI datasets.
What problem does this paper attempt to address?