Methods for estimating the size of Google Scholar

Enrique Orduna-Malea,Juan M. Ayllon,Alberto Martin-Martin,Emilio Delgado Lopez-Cozar
DOI: https://doi.org/10.1007/s11192-015-1614-6
2015-06-10
Abstract:The emergence of academic search engines (mainly Google Scholar and Microsoft Academic Search) that aspire to index the entirety of current academic knowledge has revived and increased interest in the size of the academic web. The main objective of this paper is to propose various methods to estimate the current size (number of indexed documents) of Google Scholar (May 2014) and to determine its validity, precision and reliability. To do this, we present, apply and discuss three empirical methods: an external estimate based on empirical studies of Google Scholar coverage, and two internal estimate methods based on direct, empty and absurd queries, respectively. The results, despite providing disparate values, place the estimated size of Google Scholar at around 160 to 165 million documents. However, all the methods show considerable limitations and uncertainties due to inconsistencies in the Google Scholar search functionalities.
Digital Libraries
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to estimate the current scale of Google Scholar (i.e., the number of indexed documents), and to evaluate the effectiveness, accuracy and reliability of the proposed methods. Specifically, the author proposes three methods to estimate the scale of Google Scholar in May 2014: 1. **External Estimation Method**: Estimation is based on empirical research on the coverage of Google Scholar. This method obtains a scale factor by comparing the number of literatures in Google Scholar with other databases (such as Web of Science), and then calculates the total scale of Google Scholar. 2. **Internal Estimation Method**: - **Direct Empty - Query Method**: By performing an empty query in Google Scholar (i.e., leaving the search box blank), querying year by year or by time period, and accumulating the results to estimate the total number of documents. - **Absurd Query Method**: Use a specific syntax in Boolean logic to construct a query that theoretically returns all records, for example, using the syntax of `<common word - site: non - existent website>` to obtain the number of all records. These methods aim to provide a relatively accurate estimate of the scale of Google Scholar and explore its effectiveness and limitations. According to the results of the paper, although there are differences in the estimates provided by different methods, overall, the scale of Google Scholar is estimated to be approximately 160 million to 165 million documents. However, all methods show certain limitations and uncertainties, mainly due to the inconsistency and technical limitations of Google Scholar's search function.