Abstract:The effective utilization of accumulated forestry science papers is of paramount significance in enhancing our understanding of the current state of forests and the formulation of strategies for forest environmental preservation. However, the present challenge lies in the deficient richness of metadata associated with these pivotal documents, rendering their comprehensive exploitation a formidable endeavor. Metadata from forestry science papers serves as a foundational cornerstone for the efficient management and utilization of these scholarly documents, playing an indispensable role in the advancement of research within the domain of forestry science. Constructing a training corpus and extracting distant semantic relationships is challenging inherent, the utilization of named entity recognition ( NER ) technology for metadata entity identification in forestry science papers remains an unexplored avenue. To overcome these limitations, this paper creates a specialized training corpus and introduces a novel few-shot NER framework tailored specifically for metadata extraction from forestry science papers. Within this innovative framework, a data augmentation layer, employing word replacement ( WR ) and enhanced mixup ( EM ), effectively addresses the issue of suboptimal performance resulting from a scarcity of training data. The semantic comprehension layer incorporates a multi-granularity dilated convolution neural network ( MGDCNN ) to capture and extract distant semantic associations. Moreover, a meta-learning-based reweighting layer is introduced to mitigate the adverse effects of low-quality augmented examples on the model. Experimental results conclusively demonstrate the efficacy of the proposed framework, yielding precision , recall , and F 1 of 91.08%, 88.96%, and 90.00%, respectively. Compared to traditional models, precision , recall , and F 1 can be improved by up to 10.69%, 7.48%, and 9.07%, respectively.

Metadata Extraction for Scientific Papers

Automatic Document Metadata Extraction Based on Deep Networks.

Citation Metadata Extraction Via Deep Neural Network-based Segment Sequence Labeling

Amplifying Scientific Paper's Abstract by Leveraging Data-Weighted Reconstruction

New Methods for Metadata Extraction from Scientific Literature

Extracting method knowledge elements from scientific literature: A rule‐based approach

Unsupervised Extraction of Representative Concepts from Scientific Literature

Multimodal Approach for Metadata Extraction from German Scientific Publications

Metadata Extraction System for Chinese Books

BibRank: Automatic Keyphrase Extraction Platform Using~Metadata

Rule Based Metadata Extraction Framework from Academic Articles

Automated Annotation of Scientific Texts for ML-based Keyphrase Extraction and Validation

Keyphrases automatic extraction from the abstracts of English scientific papers based on Scopus retrieval

Learning to Annotate Scientific Publications

TDMSci: A Specialized Corpus for Scientific Literature Entity Tagging of Tasks Datasets and Metrics

LAME: Layout Aware Metadata Extraction Approach for Research Articles

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Recommending MeSH terms for annotating biomedical articles

Keyword Extraction in Scientific Documents

Method and Dataset Entity Mining in Scientific Literature: A CNN + Bi-LSTM Model with Self-attention

Few-shot named entity recognition framework for forestry science metadata extraction