Just Google It: An Approach on Word Frequencies Based on Online Search Result

Carmen Moret-Tatay,Daniel Gamermann,Michael Murphy,Anezka Kuzmičová
DOI: https://doi.org/10.1080/00221309.2018.1459451
Abstract:Word frequency is one of the most robust factors in the literature on word processing, based on the lexical corpus of a language. However, different sources might be used in order to determine the actual frequency of each word. Recent research has determined frequencies based on movie subtitles, Twitter, blog posts, or newspapers. In this paper, we examine a determination of these frequencies based on the World Wide Web. For this purpose, a Python script was developed to obtain frequencies of a word through online search results. These frequencies were employed to estimate lexical decision times in comparison to the traditional frequencies in a lexical decision task. It was found that the Google frequencies predict reaction times comparably to the traditional frequencies. Still, the explained variance was higher for the traditional database.
What problem does this paper attempt to address?