On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation

Di Wu,Wasi Uddin Ahmad,Kai-Wei Chang
2024-02-22
Abstract:This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) amidst the broader availability of domain-tailored encoder-only models compared to encoder-decoder models. We investigate three core inquiries: (1) the efficacy of encoder-only PLMs in KPG, (2) optimal architectural decisions for employing encoder-only PLMs in KPG, and (3) a performance comparison between in-domain encoder-only and encoder-decoder PLMs across varied resource settings. Our findings, derived from extensive experimentation in two domains reveal that with encoder-only PLMs, although KPE with Conditional Random Fields slightly excels in identifying present keyphrases, the KPG formulation renders a broader spectrum of keyphrase predictions. Additionally, prefix-LM fine-tuning of encoder-only PLMs emerges as a strong and data-efficient strategy for KPG, outperforming general-domain seq2seq PLMs. We also identify a favorable parameter allocation towards model depth rather than width when employing encoder-decoder architectures initialized with encoder-only PLMs. The study sheds light on the potential of utilizing encoder-only PLMs for advancing KPG systems and provides a groundwork for future KPG methods. Our code and pre-trained checkpoints are released at https://github.com/uclanlp/DeepKPG.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore how to effectively generate keyphrases (Keyphrase Generation, KPG) using encoder-only Pre-trained Language Models (PLMs). Specifically, the paper focuses on the following three core issues: 1. **Effectiveness of Encoder-only PLMs in KPG**: - Investigate whether encoder-only PLMs can be used for KPG and whether their performance is similar to that in Keyphrase Extraction (KPE). 2. **Optimal Architecture Choice for KPG using Encoder-only PLMs**: - Explore different architectural decisions, such as sequence labeling and prefix-LM, to determine which method is best suited for the application of encoder-only PLMs in KPG tasks. 3. **Performance Comparison of Domain-specific Encoder-only PLMs and Encoder-Decoder PLMs under Different Resource Settings**: - Compare the performance of domain-specific encoder-only PLMs and encoder-decoder PLMs in KPG tasks under both rich-resource and low-resource settings. ### Background and Motivation - **Importance of Keyphrases**: Keyphrases can condense important information from documents and have wide applications in document indexing, information linking, recommendation systems, etc. - **Traditional Methods**: Traditional keyphrases are divided into present keyphrases and absent keyphrases. The KPE task requires the model to identify present keyphrases, while the KPG task requires predicting both present and absent keyphrases. - **Development of Pre-trained Language Models**: In recent years, the emergence of PLMs has greatly advanced KPE and KPG, especially in zero-shot, multilingual, low-resource, and cross-domain scenarios. - **Challenges in Practical Applications**: Although KPG models have advantages in generating absent keyphrases, in specific domain applications, people prefer to use domain-specific encoder-only PLMs (such as BERT) because these models are easier to obtain and fine-tune on domain-specific data. ### Research Methods - **Experimental Design**: The paper studies four methods of using encoder-only PLMs for KPE and KPG, including sequence labeling (with or without Conditional Random Fields, CRF), prefix-LM, and initializing encoder-decoder architectures with encoder-only PLMs. - **Datasets**: The experiments use two widely-used KPG benchmark datasets from the scientific and news domains: KP20k and KPTimes. - **Evaluation Metrics**: The main evaluation metrics include macro-averaged F1@5 and F1@M, targeting the performance of present and absent keyphrases, respectively. ### Main Findings 1. **Performance on Present Keyphrases**: - The KPE method using CRF slightly outperforms the KPG method in identifying present keyphrases, but the KPG method can generate more keyphrases, including absent keyphrases. 2. **Effectiveness of Prefix Language Models**: - Prefix-LM is a powerful and data-efficient KPG method, even outperforming general-domain sequence-to-sequence PLMs at the same scale. 3. **Parameter Allocation Strategy**: - For initializing encoder-decoder architectures with encoder-only PLMs, prioritizing increasing model depth (number of layers) over width (number of parameters per layer) and combining deep encoders with shallow decoders results in better keyphrase quality and inference latency. 4. **Transferability of Domain-specific Models**: - SciBERT from the scientific domain performs well in the news domain, while NewsBERT from the news domain performs poorly in the scientific domain. ### Conclusion This paper empirically demonstrates the potential of using encoder-only PLMs to build KPG systems, especially in highly specialized domains. The research findings provide important references for the future development of KPG methods.