Automatic Cloze Generation for English Proficiency Testing

Simon Smith,A. Kilgarriff,G. Wen-liang,S. Sommers,W. Guang-zhong
2009-01-01
Abstract:Cloze exercises are widely used in language teaching, both as a learning resource and an assessment tool. Cloze has a particular role to play in proficiency testing, where students are expected to demonstrate wide vocabulary knowledge. Cloze allows students to show that they understand the vocabulary in context, discouraging the memorization of synonyms or translations. However, it is time-consuming and difficult for item writers to make up large numbers of cloze exercises. We present a system which automatically generates cloze exercises from a corpus. It takes the word which will form the correct answer to the exercise (the key) as input. It extracts distractors with similarities to the key from a distributional thesaurus. It then identifies a collocate of the key that does not co-occur with the distractors. Next it finds a short, simple sentence in the corpus which contains the key and the collocate. It then presents the whole item (sentence with blanked-out key, key, three distractors) to a human item-writer for approval, modification or rejection. The system has been implemented as an application using the web API to the Sketch Engine, a leading corpus query system. We use a very large corpus (UKWaC, with 1.5 billion words) as this gives a fair-sized set of sentences to choose from for most key+collocate combinations, and allows us to infer with some confidence that, where a distractor has zero occurrences with a collocate, the combination is infelicitous. We present an initial evaluation.
What problem does this paper attempt to address?