Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

Fanyi Qu,Hao Sun,Yunfang Wu

2024-06-03

Abstract:Within the context of reading comprehension, the task of Distractor Generation (DG) aims to generate several incorrect options to confuse readers. Traditional supervised methods for DG rely heavily on expensive human-annotated distractor labels. In this paper, we propose an unsupervised DG framework, leveraging Large Language Models (LLMs) as cost-effective annotators to enhance the DG capability of smaller student models. Specially, to perform knowledge distilling, we propose a dual task training strategy that integrates pseudo distractors from LLMs and the original answer in-formation as the objective targets with a two-stage training process. Moreover, we devise a counterfactual contrastive decoding mechanism for increasing the distracting capability of the DG model. Experiments show that our unsupervised generation method with Bart-base greatly surpasses GPT-3.5-turbo performance with only 200 times fewer model parameters. Our proposed unsupervised DG method offers a cost-effective framework for practical reading comprehension applications, without the need of laborious distractor annotation and costly large-size models

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the problem of Distractor Generation (DG) in reading comprehension tasks. Traditional supervised methods rely on expensive human - annotated distractor labels, which are costly in practical applications and difficult to obtain on a large scale. Therefore, this paper proposes an unsupervised DG framework, using large - language models (LLMs) as low - cost annotation tools to enhance the DG ability of small student models. Specifically, this framework integrates the pseudo - distractors generated by LLMs and the original answer information through a dual - task training strategy, and designs a counterfactual contrastive decoding mechanism to improve the interference ability of the DG model. Experimental results show that, in the case of using the Bart - base model, this unsupervised generation method significantly outperforms the performance of GPT - 3.5 - turbo, while the number of model parameters is only 1/200 of the latter. This method provides an efficient and low - cost framework for practical reading comprehension applications, without the need to spend a great deal of manpower on distractor annotation or use resource - intensive large - scale models.

Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension

Generating Distractors for Reading Comprehension Questions from Real Examinations

Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models

Co-Attention Hierarchical Network: Generating Coherent Long Distractors for Reading Comprehension

Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning

Qadg: Generating question–answer-distractors pairs for real examination

Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering

Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding

Accurate, Diverse and Multiple Distractor Generation with Mixture of Experts.

DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions

Mitigating the Influence of Distractor Tasks in LMs with Prior-Aware Decoding

Difficulty-aware Distractor Generation for Gap-Fill Items.

Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation

CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model

Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation

DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking

Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher

Unlocking Anticipatory Text Generation: A Constrained Approach for Large Language Models Decoding

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding