Abstract:This research investigates whether OpenAI's GPT-4, a state-of-the-art large language model, can accurately classify the political bias of news sources based solely on their URLs. Given the subjective nature of political labels, third-party bias ratings like those from Ad Fontes Media, AllSides, and Media Bias/Fact Check (MBFC) are often used in research to analyze news source diversity. This study aims to determine if GPT-4 can replicate these human ratings on a seven-degree scale ("far-left" to "far-right"). The analysis compares GPT-4's classifications against MBFC's, and controls for website popularity using Open PageRank scores. Findings reveal a high correlation ($\text{Spearman's } \rho = .89$, $n = 5,877$, $p < 0.001$) between GPT-4's and MBFC's ratings, indicating the model's potential reliability. However, GPT-4 abstained from classifying approximately $\frac{2}{3}$ of the dataset. It is more likely to abstain from rating unpopular websites, which also suffer from less accurate assessments. The LLM tends to avoid classifying sources that MBFC considers to be centrist, resulting in more polarized outputs. Finally, this analysis shows a slight leftward skew in GPT's classifications compared to MBFC's. Therefore, while this paper suggests that while GPT-4 can be a scalable, cost-effective tool for political bias classification of news websites, its use should be as a complement to human judgment to mitigate biases.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to explore whether OpenAI's GPT-4 can accurately classify the political bias of news sources solely based on their URLs. Specifically, the researchers seek to evaluate GPT-4's capabilities through the following two research questions: 1. **RQ1**: Can large language models (LLMs) classify the political bias of news sources based on their URLs consistently with human ratings, relying only on their built-in knowledge (without accessing the internet)? 2. **RQ2**: Does the classification accuracy vary with the popularity of the news source? ### Background and Motivation Political labels have a certain degree of subjectivity, so third-party bias ratings (such as Ad Fontes Media, AllSides, and Media Bias/Fact Check (MBFC)) are often used in research to analyze the diversity of news sources. These ratings help balance the information ecosystem, allowing people to better understand the content they consume. However, conducting such ratings is usually time-consuming and resource-intensive. Therefore, researchers hope to explore the possibility of using large language models (LLMs) like GPT-4 to automate this process. ### Methods The researchers collected classification data from MBFC and website popularity scores from Open PageRank. They used GPT-4 to classify the political bias of these URLs and compared the results with MBFC's ratings. To evaluate GPT-4's performance, the researchers used the Spearman correlation coefficient to measure the correlation between the two and analyzed the impact of website popularity on classification accuracy through linear and logistic regression. ### Main Findings 1. **High Correlation**: There is a high correlation between GPT-4 and MBFC's political bias ratings (Spearman’s ρ = 0.89, n = 5,877, p < 0.001), indicating that GPT-4 demonstrated high reliability in the classification task. 2. **Unclassified Cases**: GPT-4 did not classify about two-thirds of the dataset, especially for unpopular websites. This may be because these websites appear less frequently in the training data, leading to a lack of sufficient information for accurate classification. 3. **Political Stance Distribution**: GPT-4 tends to avoid classifying sources that MBFC considers to be of a middle stance, resulting in more polarized outputs. Additionally, GPT-4's classification results are slightly left-leaning compared to MBFC. 4. **Impact of Popularity**: The popularity of a website significantly affects its classification accuracy. More popular websites are more likely to be accurately classified, while unpopular websites are more likely to be misclassified or unclassified. ### Conclusion Although GPT-4 shows a certain degree of reliability and efficiency in the political bias classification task, its classification results should still be treated with caution. It is recommended to use it as an auxiliary tool in conjunction with human judgment to reduce potential biases. Furthermore, future research can explore how to improve the model's performance in handling unpopular websites.

LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains

Is GPT-4 Less Politically Biased than GPT-3.5? A Renewed Investigation of ChatGPT's Political Biases

ChatGPT v.s. Media Bias: A Comparative Study of GPT-3.5 and Fine-tuned Language Models

The Self-Perception and Political Biases of ChatGPT

Accuracy and Political Bias of News Source Credibility Ratings by Large Language Models

Large Language Models' Detection of Political Orientation in Newspapers

Assessing Political Bias in Large Language Models

Inducing Political Bias Allows Language Models Anticipate Partisan Reactions to Controversies

Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Diminished diversity-of-thought in a standard large language model

Whose Side Are You On? Investigating the Political Stance of Large Language Models

Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2

An Empirical Analysis on Large Language Models in Debate Evaluation

Identifying the sources of ideological bias in GPT models through linguistic variation in output

Multilingual Coarse Political Stance Classification of Media. The Editorial Line of a ChatGPT and Bard Newspaper

Balancing Transparency and Accuracy: A Comparative Analysis of Rule-Based and Deep Learning Models in Political Bias Classification

Large Language Models Can Infer Psychological Dispositions of Social Media Users

Red AI? Inconsistent Responses from GPT3.5 Models on Political Issues in the US and China

We Can Detect Your Bias: Predicting the Political Ideology of News Articles

Artificial Intelligence Tools and Bias in Journalism-related Content Generation: Comparison Between Chat GPT-3.5, GPT-4 and Bing

LLM Voting: Human Choices and AI Collective Decision Making