LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains

Raphael Hernandes,Giulio Corsi
2024-10-23
Abstract:This research investigates whether OpenAI's GPT-4, a state-of-the-art large language model, can accurately classify the political bias of news sources based solely on their URLs. Given the subjective nature of political labels, third-party bias ratings like those from Ad Fontes Media, AllSides, and Media Bias/Fact Check (MBFC) are often used in research to analyze news source diversity. This study aims to determine if GPT-4 can replicate these human ratings on a seven-degree scale ("far-left" to "far-right"). The analysis compares GPT-4's classifications against MBFC's, and controls for website popularity using Open PageRank scores. Findings reveal a high correlation ($\text{Spearman's } \rho = .89$, $n = 5,877$, $p < 0.001$) between GPT-4's and MBFC's ratings, indicating the model's potential reliability. However, GPT-4 abstained from classifying approximately $\frac{2}{3}$ of the dataset. It is more likely to abstain from rating unpopular websites, which also suffer from less accurate assessments. The LLM tends to avoid classifying sources that MBFC considers to be centrist, resulting in more polarized outputs. Finally, this analysis shows a slight leftward skew in GPT's classifications compared to MBFC's. Therefore, while this paper suggests that while GPT-4 can be a scalable, cost-effective tool for political bias classification of news websites, its use should be as a complement to human judgment to mitigate biases.
Computation and Language,Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to explore whether OpenAI's GPT-4 can accurately classify the political bias of news sources solely based on their URLs. Specifically, the researchers seek to evaluate GPT-4's capabilities through the following two research questions: 1. **RQ1**: Can large language models (LLMs) classify the political bias of news sources based on their URLs consistently with human ratings, relying only on their built-in knowledge (without accessing the internet)? 2. **RQ2**: Does the classification accuracy vary with the popularity of the news source? ### Background and Motivation Political labels have a certain degree of subjectivity, so third-party bias ratings (such as Ad Fontes Media, AllSides, and Media Bias/Fact Check (MBFC)) are often used in research to analyze the diversity of news sources. These ratings help balance the information ecosystem, allowing people to better understand the content they consume. However, conducting such ratings is usually time-consuming and resource-intensive. Therefore, researchers hope to explore the possibility of using large language models (LLMs) like GPT-4 to automate this process. ### Methods The researchers collected classification data from MBFC and website popularity scores from Open PageRank. They used GPT-4 to classify the political bias of these URLs and compared the results with MBFC's ratings. To evaluate GPT-4's performance, the researchers used the Spearman correlation coefficient to measure the correlation between the two and analyzed the impact of website popularity on classification accuracy through linear and logistic regression. ### Main Findings 1. **High Correlation**: There is a high correlation between GPT-4 and MBFC's political bias ratings (Spearman’s ρ = 0.89, n = 5,877, p < 0.001), indicating that GPT-4 demonstrated high reliability in the classification task. 2. **Unclassified Cases**: GPT-4 did not classify about two-thirds of the dataset, especially for unpopular websites. This may be because these websites appear less frequently in the training data, leading to a lack of sufficient information for accurate classification. 3. **Political Stance Distribution**: GPT-4 tends to avoid classifying sources that MBFC considers to be of a middle stance, resulting in more polarized outputs. Additionally, GPT-4's classification results are slightly left-leaning compared to MBFC. 4. **Impact of Popularity**: The popularity of a website significantly affects its classification accuracy. More popular websites are more likely to be accurately classified, while unpopular websites are more likely to be misclassified or unclassified. ### Conclusion Although GPT-4 shows a certain degree of reliability and efficiency in the political bias classification task, its classification results should still be treated with caution. It is recommended to use it as an auxiliary tool in conjunction with human judgment to reduce potential biases. Furthermore, future research can explore how to improve the model's performance in handling unpopular websites.