Few-shot named entity recognition framework for forestry science metadata extraction
Yuquan Fan,Hong Xiao,Min Wang,Junchi Wang,Wenchao Jiang,Chang Zhu
DOI: https://doi.org/10.1007/s12652-023-04740-4
IF: 3.662
2024-02-03
Journal of Ambient Intelligence and Humanized Computing
Abstract:The effective utilization of accumulated forestry science papers is of paramount significance in enhancing our understanding of the current state of forests and the formulation of strategies for forest environmental preservation. However, the present challenge lies in the deficient richness of metadata associated with these pivotal documents, rendering their comprehensive exploitation a formidable endeavor. Metadata from forestry science papers serves as a foundational cornerstone for the efficient management and utilization of these scholarly documents, playing an indispensable role in the advancement of research within the domain of forestry science. Constructing a training corpus and extracting distant semantic relationships is challenging inherent, the utilization of named entity recognition ( NER ) technology for metadata entity identification in forestry science papers remains an unexplored avenue. To overcome these limitations, this paper creates a specialized training corpus and introduces a novel few-shot NER framework tailored specifically for metadata extraction from forestry science papers. Within this innovative framework, a data augmentation layer, employing word replacement ( WR ) and enhanced mixup ( EM ), effectively addresses the issue of suboptimal performance resulting from a scarcity of training data. The semantic comprehension layer incorporates a multi-granularity dilated convolution neural network ( MGDCNN ) to capture and extract distant semantic associations. Moreover, a meta-learning-based reweighting layer is introduced to mitigate the adverse effects of low-quality augmented examples on the model. Experimental results conclusively demonstrate the efficacy of the proposed framework, yielding precision , recall , and F 1 of 91.08%, 88.96%, and 90.00%, respectively. Compared to traditional models, precision , recall , and F 1 can be improved by up to 10.69%, 7.48%, and 9.07%, respectively.
computer science, information systems,telecommunications, artificial intelligence