Protein-protein interaction extraction based on combining TSVM and active learning

Jianmiao Liu,Haochang Wang,Tiejun Zhao
DOI: https://doi.org/10.3772/j.issn.1002-0470.2009.05.008
2009-01-01
Abstract:This paper presents an algorithm for extraction of protein-protein interaction (PPI) based on the combination of the transductive support vector machine (TSVM) approach with the active learning algorithm to solve the problems which are the lack of labeled corpora and the easy usage of the vast amount of unlabeled biomedical free texts. The algorithm can maximally increase the performance of the TSVM algorithm through actively selecting useful unlabeled samples and adding them to the TSVM training set. The experiment results show that combing TSVM with the active learning is very promising on a mixed training set with a small number of labeled samples and a large number of unlabeled samples. Compared with the traditional support vector machine (SVM) algorithm and the TSVM algorithm, the paper proposed algorithm can immensely reduce the number of the training data and efficiently improve the performance of the classifier for PPI extraction. A very encouraging result of 64.12% F-score on the AImed corpus was achieved.
What problem does this paper attempt to address?