How operationalizations of word types affect measures of lexical diversity

Scott Jarvis,Brett James Hashimoto
DOI: https://doi.org/10.1075/ijlcr.20004.jar
2021-03-01
Abstract:Abstract This study tests three measures of lexical diversity (LD), each using five operationalizations of word types. The measures include MTLD (measure of textual lexical diversity), MTLD-W (moving average MTLD with wrap-around measurement), and MATTR (moving average type-token ratio). Each of these measures is tested with types operationalized as orthographic forms, lemmas using automated POS tags, lemmas using manually corrected POS tags, flemmas (list-based lemmas that do not distinguish between parts of speech), and word families. These measures are applied to 60 narrative texts written in English by adolescent native speakers of English ( n = 13), Finnish ( n = 31), and Swedish ( n = 16). Each individual LD measure is evaluated in relation to how well it correlates with the mean LD ratings of 55 human raters whose inter-rater reliability was exceedingly high (Cronbach’s alpha = .980). The overall results show that the three measures are comparable but two of the operationalizations of types produce mixed results across measures.
What problem does this paper attempt to address?