Validation of Automated Scoring of Oral Reading

Jennifer Balogh,Jared Bernstein,Jian Cheng,Alistair Van Moere,Brent Townshend,Masanori Suzuki
DOI: https://doi.org/10.1177/0013164411412590
2011-11-28
Educational and Psychological Measurement
Abstract:A two-part experiment is presented that validates a new measurement tool for scoring oral reading ability. Data collected by the U.S. government in a large-scale literacy assessment of adults were analyzed by a system called VersaReader that uses automatic speech recognition and speech processing technologies to score oral reading fluency. In the first part of the experiment, human raters rated oral reading performances to establish a criterion measure for comparisons with the machine scores. The goal was to measure the reliability of ratings from human raters and to determine whether or not the human raters biased their ratings in favor of or against three groups of readers: Spanish speakers, African Americans, and all other native English speakers. The result of the experiment showed that ratings from skilled human raters were extremely reliable. In addition, there was no observed scoring bias for human raters. The second part of the experiment was designed to compare the criterion human ratings with scores generated by VersaReader. Correlations between VersaReader scores and human ratings approached unity. Using G-Theory, the results showed that machine scores were almost identical to scores from human raters. Finally, the results revealed no bias in the machine scores. Implications for large-scale assessments are discussed.
psychology, educational, mathematical,mathematics, interdisciplinary applications
What problem does this paper attempt to address?