Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar

Celina Kacperski,Mona Bielig,Mykola Makhortykh,Maryna Sydorova,Roberto Ulloa
2023-11-21
Abstract:Researchers rely on academic web search engines to find scientific sources, but search engine mechanisms may selectively present content that aligns with biases embedded in the queries. This study examines whether confirmation-biased queries prompted into Google Scholar and Semantic Scholar will yield skewed results. Six queries (topics across health and technology domains such as "vaccines" or "internet use") were analyzed for disparities in search results. We confirm that biased queries (targeting "benefits" or "risks") affect search results in line with the bias, with technology-related queries displaying more significant disparities. Overall, Semantic Scholar exhibited fewer disparities than Google Scholar. Topics rated as more polarizing did not consistently show more skewed results. Academic search results that perpetuate confirmation bias have strong implications for both researchers and citizens searching for evidence. More research is needed to explore how scientific inquiry and academic search engines interact.
Computers and Society
What problem does this paper attempt to address?
The paper aims to investigate whether academic search engines generate biased search results when dealing with biased queries, with a focus on Google Scholar and Semantic Scholar. The study analyzes the impact of biased queries, such as those regarding vaccines and the risks and benefits of internet usage, on search results. The results show that biased queries do indeed influence search results, leading them to be biased towards the direction of the query, particularly in technology-related topics. Semantic Scholar demonstrates a more balanced presentation of results compared to Google Scholar. However, topics considered more controversial do not necessarily exhibit more severe biased results. The study hypothesizes four main points: first, biased queries increase result variations; second, result variations differ across different fields (health and technology) and topics; third, different academic search engines (Google Scholar and Semantic Scholar) exhibit different result variations; fourth, result variations are smaller for topics assessed as more polarized. Experimental data supports the first three hypotheses, but the fourth hypothesis does not receive consistent support. The study collects and analyzes responses from the two search engines for specific queries in different regions and browser environments. The top 10 article abstracts are evaluated manually to determine whether each article confirms the risks or benefits mentioned in the query. Data analysis confirms that queries biased towards benefits result in more reports of benefits, while queries biased towards risks have the opposite effect. Additionally, queries in the technology field have larger result variations compared to those in the health field, and Semantic Scholar exhibits overall smaller result variations than Google Scholar. The study also investigates the public's perception of polarization and media salience for different topics to validate the reasoning behind topic selection. The results show that compared to coffee and the internet, vaccine-related topics, social media, and cryptocurrency are generally perceived as more polarized. However, the relationship between the degree of polarization and result variations is not clear. The discussion section emphasizes the potential impact of biases in academic search engines on researchers, educators, and students in their information retrieval. It calls for increased awareness of such biases and suggests strategies for researchers to minimize confirmation bias, such as using diversified query approaches and critically evaluating search results. Additionally, search engines themselves should develop mechanisms to assist users in identifying and correcting potential biases, utilizing new technologies such as natural language processing (NLP) to achieve this goal. Overall, this study highlights the critical role of academic search engines in shaping scientific research and knowledge advancement, as well as their potential negative effects, thereby promoting further exploration of biases in the academic information retrieval process and mitigation measures.