A Generalized Decoding Method for Neural Text Generation
Ning Gong,Nianmin Yao
DOI: https://doi.org/10.1016/j.csl.2023.101503
IF: 3.252
2023-01-01
Computer Speech & Language
Abstract:In natural language generation, most decoding methods are not intrinsic because their performance depends on extrinsically configured hyperparameters. It means that: first, the generation system is dynamic under different conditions while the decoding system is always static under any conditions once its hyperparameters are extrinsically fixed; second, it is hard to select a constant decoding hyperparameter that is omnipotent for all conditions. Although there are decoding methods that are hyperparameter-free, such as greedy and plain sampling, it has been well studied that these methods generally perform worse than methods with hyperparameters, such as beam search, top-k and top-p. Decoding with hyperparameters can get infinite strategies from different fixed configurations, while hyperparameter-free methods have only one strategy. Therefore, the comparison between them is actually unfair, which is a one-vs-infinite battle. So how to deal with the decoding hyperparameters properly and intrinsically? Is it true that hyperparameter-free methods are always inferior to methods with inexhaustible hyperparameter configurations? Is it possible to design a generalized framework, by which these decoding methods can be naturally connected, uniformly described, and mutually inspired? In these paper, we try to find answers to these questions. To this end, we first propose a generalized decoding framework, i.e., GSD, that can be used to uniformly describe and connect existing popular decoding methods. As far as we know, this is the first work trying to build a theoretical framework to associate these decoding methods in formal mathematical theorems. Based on the framework, we then propose Intrinsic Decoding, a novel implementation of GSD with distinctive design from existing decoding algorithms: it is intrinsic and dynamic. Intrinsic Decoding changes the aforementioned comparison from one-vs-infinite to dynamic-vs-infinite. Just like greedy and sampling, Intrinsic Decoding has no hyperparameter, while effecting better than both greedy and sampling, even achieving comparable performance to the methods equipped with inexhaustible hyperparameter configurations, such as beam search, top-k and top-p.