A Dataset for Entity Recognition of COVID-19 Public Opinion in Social Media
Linlin Hou,Lingling Li,Dandan Ren,Xin Wang,Ting Yu,Ji Zhang
DOI: https://doi.org/10.1109/BESC59560.2023.10386310
2023-01-01
Abstract:With the outbreak of the epidemic, it has had a major impact on the economy, society, and people’s lives. The entity mining of network public opinion is important, which is helpful for theme mining, subsequent emotion analysis, knowledge graph construction, entity relationship extraction and other prediction tasks, and can find useful knowledge and key information. However, existing named entity recognition (NER) datasets that are available publicly or used by other existing works mainly focus on simple entity forms, such as places, organizations, people, with less focus on medical entities related to COVID-19 in the media. Additionally, there are very limited Chinese datasets that address COVID-19-related NER. Therefore, in this paper, we create a Chinese dataset called CoV-Ch, which is derived from online Weibo news and comments about COVID-19. We define 10 entity types, including 4 general entity types (Person, Organization, Location, Time), 6 medical and COVID-19-related entity types (Disease, Symptom, Medicine, Treatment, Tool, Policy). CoV-Ch contains 8000 sentences, 10735 entities. These ten entity types contain key information related to COVID-19 public opinion, which help to monitor the development of the pandemic. By observing that these entity types appear relatively frequently in epidemic web posts, we can conclude that entity types should be useful and available in the text. We benchmark the performance of classical deep learning models on our dataset for the NER task with extensive experiments. Results show the performance of the BERT-based methods is better. But, the dataset has vast room for improvement for the specific NER task.