Novel semi-supervised text entity information extraction method

Shou Lidan,Wang Jue,Chen Ke,Chen Gang,Wu Sai,Luo Xinyuan
2020-01-01
Abstract:The invention discloses a novel semi-supervised text entity information extraction method. The method includes: Ssegmenting the document phrases to obtain a candidate entity set; establishing a supervised learning part and an unsupervised learning part, wherein documents with and without labels are respectively subjected to supervised learning and unsupervised learning; the document and entity type input entity extraction module outputs entity information; the document and phrase input type selection module outputs phrase probability distribution; adding the loss values of the two modules to obtain loss; the document is input into the entity extraction module to obtain the loss of each entity type; the document and phrase input type selection module outputs phrase probability distribution;multiplying the loss values of the two modules and adding to form loss; performing weighted calculation on the two parts to obtain total loss, and performing optimization training to obtain model parameters; and sequentially inputting the test text into an entity extraction module and a type selection module to obtain entity information. According to the method, massive unlabeled data can be utilized, the model performance is significantly enhanced, the method is significantly improved under the condition of small sample labeled data, and the method is also suitable for semi-supervised text data processing of zero sample learning.
What problem does this paper attempt to address?