A Knowledge Extraction Framework for Domain-Specific Application with Simplified Pre-Trained Language Model and Attention-Based Feature Extractor

Zhang Jian,Qin Bo,Zhang Yufei,Zhou Junhua,Wang Hongwei
DOI: https://doi.org/10.1007/s11761-022-00337-5
2022-01-01
Abstract:With the advancement of industrial informatics, intelligent algorithms are increasingly applied in various industrial products and applications. In this paper, we proposed a knowledge extraction framework for domain-specific text. This framework can extract entities from text the subsequent tasks such as knowledge graph construction. The proposed framework contains three modules, namely domain feature pre-trained model, LSTM-based named entity recognition and the attention-based nested named entity recognition. The domain feature pre-trained model can effectively learn the features of domain corpus such as professional terms that are not included in the general domain corpus. Flat named entity recognition can use the vector from pre-trained model to obtain the entity from domain-specific text. The nested named entity recognition based on the attention mechanism and the weight sliding balance strategy can effectively identify entity types with higher nesting rates. The framework achieves good results in the field of nuclear power plant maintenance reports, and the methods for domain pre-trained model and LSTM-based flat named entity recognition have been successfully applied to practical tasks.
What problem does this paper attempt to address?