Few-shot learning for name entity recognition in geological text based on GeoBERT
Hao Liu,Qinjun Qiu,Liang Wu,Wenjia Li,Bin Wang,Yuan Zhou
DOI: https://doi.org/10.1007/s12145-022-00775-x
2022-03-11
Earth Science Informatics
Abstract:Geological reports are records of the geological elements and survey contents found in geological exploration, but it is difficult to extract useful concepts from such reports. In the process of information extraction, accurately identification of entities in unstructured geotext is a foundational task that is known as geological named entity recognition (Geo-NER). However, the existing methods generally require a large number of annotated corpora, and face problems with long entity recognition. Therefore, this paper proposes a two-stage fine-tuning method. In the first fine-tuning stage, we use a bidirectional encoder representations from transformers language model with geological domain knowledge (GeoBERT), which combines geological domain knowledge, on a pretrained BERT model, and in the second stage, we use a small number of samples to complete the NER task in the geological report based on GeoBERT. Our proposed model achieves a very high F1-score compared to baseline models on the constructed dataset.
geosciences, multidisciplinary,computer science, interdisciplinary applications