Extraction of temporal information from social media messages using the BERT model
Kai Ma,Yongjian Tan,Miao Tian,Xuejing Xie,Qinjun Qiu,Sanfeng Li,Xin Wang
DOI: https://doi.org/10.1007/s12145-021-00756-6
2022-01-10
Earth Science Informatics
Abstract:Temporal information extraction from social media messages is of critical importance to several geographical applications. Combined with the characteristics of temporal information descriptions in Chinese text, different time expression patterns formed by time unit combinations are summarized. A deep learning-based information extraction algorithm (named BERT-BiLSTM-CRF) for automatically extracting temporal information from social media messages is proposed. Based on the bidirectional long short-term memory-conditional random field (BiLSTM-CRF) model, the BERT (bidirectional encoder representations from transformers) pretrained language model was used to enhance the generalization ability of the word vector model to capture long-range contextual information; then, the trained word vector was input into the BiLSTM-CRF model for further training. The proposed model was then evaluated on the constructed corpus, a set of manually annotated Chinese texts from social media messages. Among the basic models, the BERT-BiLSTM-CRF achieved the highest average F1-score of 85%. The experimental results show that the proposed method outperforms the current state-of-the-art models.
geosciences, multidisciplinary,computer science, interdisciplinary applications