HiCoVA: Hierarchical Conditional Variational Autoencoder for Keyphrase Generation

P. Das,Nikhil Reddy Varimalla,Debarshi Kumar Sanyal,Anoop Vallabhajosyula,Santosh T.Y.S.S
DOI: https://doi.org/10.1145/3459637.3482119
2021-10-26
Abstract:The task of keyphrase generation, unlike extraction, aims to generate the phrases which succinctly capture the key information of the source text, that are even absent in the document (i.e., do not match any contiguous sub-sequence of source text). Despite the significant progress achieved by sequence-to-sequence (seq2seq) models in modelling such high entropy task, they are limited by their deterministic modelling capability which limits the generation of a diverse set of keyphrases. To address the above limitation, in this paper, we propose to incorporate Conditional Variational Autoencoder (CoVA) into seq2seq models for its ability to represent a set of keyphrases as a probabilistic distribution which improves the diversity of the generated keyphrases. We model the probabilistic distribution using a hierarchical latent structure where a global latent variable tries to model the diversity among the keyphrases and local latent variables control the generation of each keyphrase to make them coherent. Experimental results on four benchmark datasets of research papers demonstrate the effectiveness of our proposed approach in achieving a large improvement in diversity along with modest gains in quality with respect to previous models.
Computer Science
What problem does this paper attempt to address?