Named Entity Recognition from Chinese Adverse Drug Event Reports with Lexical Feature Based BiLSTM-CRF and Tri-Training

Yao Chen,Changjiang Zhou,Tianxin Li,Hong Wu,Xia Zhao,Kai Ye,Jun Liao
DOI: https://doi.org/10.1016/j.jbi.2019.103252
IF: 8
2019-01-01
Journal of Biomedical Informatics
Abstract:BACKGROUND:The Adverse Drug Event Reports (ADERs) from the spontaneous reporting system are important data sources for studying Adverse Drug Reactions (ADRs) as well as post-marketing pharmacovigilance. Apart from the conventional ADR information contained in the structured section of ADERs, more detailed information such as pre- and post- ADR symptoms, multi-drug usages and ADR-relief treatments are described in the free-text section, which can be mined through Natural Language Processing (NLP) tools.OBJECTIVE:The goal of this study was to extract ADR-related entities from free-text section of Chinese ADERs, which can act as supplements for the information contained in structured section, so as to further assist in ADR evaluation.METHODS:Three models of Conditional Random Field (CRF), Bidirectional Long Short-Term Memory-CRF (BiLSTM-CRF) and Lexical Feature based BiLSTM-CRF (LF-BiLSTM-CRF) were constructed to conduct Named Entity Recognition (NER) tasks in free-text section of Chinese ADERs. A semi-supervised learning method of tri-training was applied on the basis of the three established models to give un-annotated raw data with reliable tags.RESULTS:Among the three basic models, the LF-BiLSTM-CRF achieved the highest average F1 score of 94.35%. After the process of tri-training, almost half of the un-annotated cases were tagged with labels, and the performances of all the three models improved after iterative training.CONCLUSIONS:The LF-BiLSTM-CRF model that we constructed could achieve a comparatively high F1 score, and the fusion of CRF, while BiLSTM-CRF and LF-BiLSTM-CRF in tri-training might further strengthen the reliability of predicted tags. The results suggested the usefulness of our methods in developing the specialized NER tools for identifying ADR-related information from Chinese ADERs.
What problem does this paper attempt to address?