A Knowledge-Based Data Augmentation Framework for Few-Shot Biomedical Information Extraction

Xin Su,Chuang Cheng,Kuo Yang,Xuezhong Zhou
DOI: https://doi.org/10.1007/978-981-99-4826-0_3
2023-01-01
Abstract:There are a lot of biomedical knowledge hidden in the massive scientific clinical literature. These knowledge exist in an unstructured form and is difficult to extract automatically. Natural language processing makes it possible to mine these knowledge automatically. At present, most information extraction models need enough data to achieve good performance. Due to the scarcity of high-quality biomedical labeled data, it is still difficult to extract biomedical literature accurately in the case of few samples. This paper describes our participation in the task 1 of the "China Health Information Processing Conference" (CHIP 2022). We proposes a knowledge-based data augmentation framework to achieve data expansion to overcome the scarcity of training data. The experimental results show that after data augmentation, the F1 score of named entity recognition using BioBERT-BiLSTM-CRF reaches 0.58 and the F1 score of relation extraction using TDEER reaches 0.6. Finally, we win the second place, which validates the performance of our approach.
What problem does this paper attempt to address?