Biomedical Event Extraction Using a New Error Detection Learning Approach Based on Neural Network

Xiaolei Ma,Yang Lu,Yinan Lu,Zhili Pei,Jichao Liu
DOI: https://doi.org/10.32604/cmc.2020.07711
2020-01-01
Abstract:Supervised machine learning approaches are effective in text mining, but their success relies heavily on manually annotated corpora. However, there are limited numbers of annotated biomedical event corpora, and the available datasets contain insufficient examples for training classifiers; the common cure is to seek large amounts of training samples from unlabeled data, but such data sets often contain many mislabeled samples, which will degrade the performance of classifiers. Therefore, this study proposes a novel error data detection approach suitable for reducing noise in unlabeled biomedical event data. First, we construct the mislabeled dataset through error data analysis with the development dataset. The sample pairs' vector representations are then obtained by the means of sequence patterns and the joint model of convolutional neural network and long short-term memory recurrent neural network. Following this, the sample identification strategy is proposed, using error detection based on pair representation for unlabeled data. With the latter, the selected samples are added to enrich the training dataset and improve the classification performance. In the BioNLP Shared Task GENIA, the experiments results indicate that the proposed approach is competent in extract the biomedical event from biomedical literature. Our approach can effectively filter some noisy examples and build a satisfactory prediction model.
What problem does this paper attempt to address?