Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models

Wanyong Feng,Jaewook Lee,Hunter McNichols,Alexander Scarlatos,Digory Smith,Simon Woodhead,Nancy Otero Ornelas,Andrew Lan

2024-04-19

Abstract:Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in assessments and practices. One of the most important aspects of MCQs is the distractors, i.e., incorrect options that are designed to target common errors or misconceptions among real students. To date, the task of crafting high-quality distractors largely remains a labor and time-intensive process for teachers and learning content designers, which has limited scalability. In this work, we study the task of automated distractor generation in the domain of math MCQs and explore a wide variety of large language model (LLM)-based approaches, from in-context learning to fine-tuning. We conduct extensive experiments using a real-world math MCQ dataset and find that although LLMs can generate some mathematically valid distractors, they are less adept at anticipating common errors or misconceptions among real students.

Computation and Language

What problem does this paper attempt to address?

The problem this paper attempts to address is the automatic generation of high-quality distractors for multiple-choice math questions. Specifically, the paper explores how to use large language models (LLMs) to automatically generate distractors for math multiple-choice questions that reflect common student errors or misconceptions. Currently, creating high-quality distractors remains a time-consuming and labor-intensive task, which limits its scalability. Therefore, this study aims to explore different methods to improve the quality and efficiency of automatically generated distractors. The main contributions of the paper include: 1. Exploring various methods to generate distractors, including in-context learning, fine-tuning, chain-of-thought prompting, and rule-based and sampling baseline methods. 2. Conducting extensive quantitative and qualitative experiments on real-world datasets, finding that the most effective method is selecting a few example inputs for LLMs in in-context learning. 3. Performing human evaluations, discovering that while LLM-generated distractors are close to human-written distractors in terms of mathematical validity, they do not necessarily reflect common student errors or misconceptions.

Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models

Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning

Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank

Math Multiple Choice Question Generation via Human-Large Language Model Collaboration

DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions

Learning to Reuse Distractors to support Multiple Choice Question Generation in Education

Leveraging Large Language Models for Multiple Choice Question Answering

Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation

DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking

Adversarial Math Word Problem Generation

Unsupervised Distractor Generation via Large Language Model Distilling and Counterfactual Contrastive Decoding

Automatic Distractor Generation for Multiple Choice Questions in Standard Tests

LLM-Resistant Math Word Problem Generation via Adversarial Attacks

Large Language Models Are Not Robust Multiple Choice Selectors.

DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

AI-Assisted Generation of Difficult Math Questions

Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering

Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning

Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions

ControlMath: Controllable Data Generation Promotes Math Generalist Models

Large Language Models Can Be Easily Distracted by Irrelevant Context