LexFindR: A fast, simple, and extensible R package for finding similar words in a lexicon

ZhaoBin Li,Anne Marie Crinnion,James S Magnuson
DOI: https://doi.org/10.3758/s13428-021-01667-6
Abstract:Language scientists often need to generate lists of related words, such as potential competitors. They may do this for purposes of experimental control (e.g., selecting items matched on lexical neighborhood but varying in word frequency), or to test theoretical predictions (e.g., hypothesizing that a novel type of competitor may impact word recognition). Several online tools are available, but most are constrained to a fixed lexicon and fixed sets of competitor definitions, and may not give the user full access to or control of source data. We present LexFindR, an open-source R package that can be easily modified to include additional, novel competitor types. LexFindR is easy to use. Because it can leverage multiple CPU cores and uses vectorized code when possible, it is also extremely fast. In this article, we present an overview of LexFindR usage, illustrated with examples. We also explain the details of how we implemented several standard lexical competitor types used in spoken word recognition research (e.g., cohorts, neighbors, embeddings, rhymes), and show how "lexical dimensions" (e.g., word frequency, word length, uniqueness point) can be integrated into LexFindR workflows (for example, to calculate "frequency-weighted competitor probabilities"), for both spoken and visual word recognition research.
What problem does this paper attempt to address?