CHTopoNER model-based method for recognizing Chinese place names from social media information
Mengwei Zhang,Xingui Liu,Zheng Zhang,Yue Qiu,Zhipeng Jiang,Pengyu Zhang,Zhang, Mengwei,Liu, Xingui,Zhang, Zheng,Jiang, Zhipeng
DOI: https://doi.org/10.1007/s10109-023-00433-w
2024-01-12
Journal of Geographical Systems
Abstract:Chinese toponym recognition is crucial in named entity recognition and has significant implications for improving geographic information systems. Based on the real-time nature of social media and rich geographical data contained in social media, it is important to identify Chinese toponyms, including compound toponyms, informal toponyms, and other forms of social media content, for automatic geospatial information extraction. However, the strong word-building ability, diverse features, and ambiguity of Chinese toponyms combined with the linguistic irregularities of social media pose significant challenges for accurately locating toponym boundaries and resolving ambiguities. Furthermore, existing Chinese toponym recognition methods often ignore the fusion of local and global features during feature extraction, resulting in semantic information loss. Therefore, we used the Chinese-roberta-wwm-ext pre-trained language model to encode input text and obtain character-level information. An improved SoftLexicon-based statistical method was employed to acquire word-level semantic information, which was then integrated with character-level semantic information. A two-channel neural network layer comprising a bi-directional long short-term memory and an inception-dilated convolutional neural network was utilized to extract global and local features from text. Additionally, a conditional random field was applied to establish label constraints. The proposed deep neural network model, called CHTopoNER, is designed to identify various forms of Chinese toponyms in irregular Chinese social media content. Its effectiveness was validated on four publicly available annotated toponym datasets and a custom social media dataset. CHTopoNER surpasses state-of-the-art Chinese toponym recognition models and achieves promising results for extracting various types of toponyms and spatial location terms.
geography