Abstract:This paper reports on an audit study of generative AI systems (ChatGPT, Bing Chat, and Perplexity) which investigates how these new search engines construct responses and establish authority for topics of public importance. We collected system responses using a set of 48 authentic queries for 4 topics over a 7-day period and analyzed the data using sentiment analysis, inductive coding and source classification. Results provide an overview of the nature of system responses across these systems and provide evidence of sentiment bias based on the queries and topics, and commercial and geographic bias in sources. The quality of sources used to support claims is uneven, relying heavily on News and Media, Business and Digital Media websites. Implications for system users emphasize the need to critically examine Generative AI system outputs when making decisions related to public interest and personal well-being.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to evaluate how generative artificial intelligence (GenAI) search engines construct responses and establish authority when dealing with topics of public importance. Specifically, the research focuses on the following two core questions:
1. **Are generative AI responses influenced by the sentiment of the query and the topic?** (RQ1)
- The research explores whether the responses of these systems are influenced by the sentiment of the query by analyzing the responses of generative AI systems to queries with different sentiment tendencies. For example, for queries with positive sentiment, whether the system will generate more positive responses, and for queries with negative sentiment, whether the system will generate more negative responses.
2. **How do generative AI search engines establish authority in their responses?** (RQ2)
- The research explores how these systems establish authority in the minds of users by analyzing the citation sources, language styles, and rhetorical techniques in the responses of generative AI systems. For example, whether the system enhances the credibility of its responses by citing high - quality sources, using professional language, or adopting specific rhetorical strategies.
### Research Methods
To answer the above questions, the research adopts the method of algorithmic auditing. The specific steps include:
- **Selecting Systems**: Three freely publicly available generative AI systems were selected: ChatGPT (based on GPT - 3.5), Bing Chat (based on GPT - 4), and Perplexity.
- **Selecting Topics and Queries**: Four topics representing current global issues were selected: climate change, vaccination, alternative energy, and media trust. For each topic, 12 actual queries with sentiment variability were collected.
- **Data Collection**: During a week in June 2023, 48 queries (12 queries for each topic) were submitted daily to these three systems, and the responses and the cited sources were recorded.
- **Data Analysis**: In the first stage, automated analysis was carried out using Python to calculate the sentiment polarity scores of the queries and responses and the response lengths. In the second stage, manual analysis was carried out on the responses of Bing Chat and Perplexity to evaluate their readability, rhetorical features, number of citations, and source types.
### Main Findings
- **Sentiment Bias**: The research found a moderately positive correlation (Pearson correlation coefficient of 0.46) between the sentiment polarity of the query and the sentiment polarity of the generative AI system's response. All three systems (ChatGPT, Bing Chat, and Perplexity) exhibited similar sentiment biases.
- **Sentiment Biases of Systems and Topics**: There were significant differences in the sentiment polarity of responses between different systems and topics. The sentiment polarity of Perplexity's responses was significantly lower than that of ChatGPT and Bing Chat. The sentiment polarity of responses on the topics of climate change and media trust was significantly lower than that of vaccination and alternative energy topics.
- **Establishing Authority**: Bing Chat tended to respond in the first person, while Perplexity more often used the third person and passive voice, emphasizing the content of the search results as an authoritative source. Most responses supported the explicit or implicit claims in the query, but when faced with counter - factual queries, the systems adopted different opposition strategies.
### Conclusion
This research reveals the potential sentiment biases and authority - establishing mechanisms of generative AI search engines when dealing with topics of public importance. These findings help to raise public awareness of generative AI systems, remind users to maintain a critical thinking when using these systems, and provide directions for improvement for system designers.