Abstract:Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation. Models are trained using teacher forcing to optimise only the one-step-ahead prediction. However, at test time, the model is asked to generate a whole sequence, causing errors to propagate through the generation process (exposure bias). A number of authors have proposed countering this bias by optimising for a reward that is less tightly coupled to the training data, using reinforcement learning. We optimise directly for quality metrics, including a novel approach using a discriminator learned directly from the training data. We confirm that policy gradient methods can be used to decouple training from the ground truth, leading to increases in the metrics used as rewards. We perform a human evaluation, and show that although these metrics have previously been assumed to be good proxies for question quality, they are poorly aligned with human judgement and the model simply learns to exploit the weaknesses of the reward source.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to optimize the model to improve the quality of generated questions when generating natural - language questions. Specifically, the author focuses on the fact that existing models use the teacher forcing technique for word - by - word prediction optimization during the training process, but need to generate the entire sequence during testing, which leads to exposure bias, that is, the error accumulation in the model generation process. In addition, existing models mainly rely on copying ground truth data for optimization, which limits the model's ability to explore a broader possibility space. To address these problems, the author proposes several optimization strategies, including directly optimizing for different objective functions, such as using an adversarial discriminator to generate questions that are indistinguishable from real examples. Through these methods, the author hopes that the model can better recover from non - optimal predictions and generate higher - quality questions. However, the research has found that although these optimization strategies improve automatic evaluation metrics (such as BLEU scores, language model scores, etc.), human evaluation shows that the quality of questions generated by these optimized models is actually inferior to that of unoptimized models. This indicates that the currently used automatic evaluation metrics may not be a good proxy for measuring the quality of question generation, and the model may take advantage of the weaknesses of these metrics to obtain high scores, while the actually generated questions may be of low quality to humans.

Evaluating Rewards for Question Generation Models

Exploring Question-Specific Rewards for Generating Deep Questions

Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering

Question Generation via Generative Adversarial Networks

Spontaneous Reward Hacking in Iterative Self-Refinement

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

Generating Self-Contained and Summary-Centric Question Answer Pairs via Differentiable Reward Imitation Learning

Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model

A Critical Look At Tokenwise Reward-Guided Text Generation

Rad: Reinforced Attention Decoder Model On Question Generation

Fine-Tuning Language Models from Human Preferences

Retrieve, Generate and Rerank: Simple and Effective Framework for Guided Human-Like Questions Generation.

Using Implicit Feedback to Improve Question Generation

QAScore -- An Unsupervised Unreferenced Metric for the Question Generation Evaluation

Improving Reward Models with Synthetic Critiques

Reinforced Multi-task Approach for Multi-hop Question Generation

A Reinforcement Learning Framework for Natural Question Generation Using Bi-discriminators

Self-Generated Critiques Boost Reward Modeling for Language Models

The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate

Semantics-Reinforced Networks for Question Generation.

How Do Seq2Seq Models Perform on End-to-End Data-to-Text Generation?