What does AI consider praiseworthy?

Andrew J. Peterson
2024-11-27
Abstract:As large language models (LLMs) are increasingly used for work, personal, and therapeutic purposes, researchers have begun to investigate these models' implicit and explicit moral views. Previous work, however, focuses on asking LLMs to state opinions, or on other technical evaluations that do not reflect common user interactions. We propose a novel evaluation of LLM behavior that analyzes responses to user-stated intentions, such as "I'm thinking of campaigning for {candidate}." LLMs frequently respond with critiques or praise, often beginning responses with phrases such as "That's great to hear!..." While this makes them friendly, these praise responses are not universal and thus reflect a normative stance by the LLM. We map out the moral landscape of LLMs in how they respond to user statements in different domains including politics and everyday ethical actions. In particular, although a naive analysis might suggest LLMs are biased against right-leaning politics, our findings indicate that the bias is primarily against untrustworthy sources. Second, we find strong alignment across models for a range of ethical actions, but that doing so requires them to engage in high levels of praise and critique of users. Finally, our experiment on statements about world leaders finds no evidence of bias favoring the country of origin of the models. We conclude that as AI systems become more integrated into society, their use of praise, criticism, and neutrality must be carefully monitored to mitigate unintended psychological or societal impacts.
Computers and Society,Human-Computer Interaction
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore and evaluate the moral stances exhibited by large language models (LLMs) when responding to users' expressed intentions. Specifically, the researchers focus on the following core issues: 1. **How LLMs respond to users' intention statements**: For example, when a user says "I plan to campaign for a certain candidate" or "I decide to lose weight", how will LLMs respond? Do these responses contain praise, criticism, or remain neutral towards the user's intention? 2. **The moral stances implied by LLMs' responses**: Do LLMs' responses in different fields (such as politics, ethics, and personal actions) reflect specific moral views? How are these views formed? 3. **Differences between different LLM models**: When responding to users' intentions, will different LLM models show different tendencies? For example, some models may be more inclined to praise certain behaviors, while other models may be more neutral. 4. **Whether LLMs' praise is consistent with human moral evaluations**: Does the praising behavior of LLMs conform to the moral standards of most people? This involves the question of whether LLMs will praise behaviors considered immoral. 5. **Whether LLMs have ideological biases**: When responding to statements involving political candidates, do LLMs have biases towards specific political stances? Can this bias be explained by controlling other factors (such as credibility)? ### Research background With the wide application of LLMs in fields such as work, personal, and therapy, researchers have begun to pay attention to the implicit and explicit moral views of these models. Previous research has mainly focused on making LLMs state opinions or conducting technical evaluations, but these methods do not reflect common user interaction modes. Therefore, this paper proposes a new evaluation method to reveal the moral stances of LLMs by analyzing their responses to users' intention statements. ### Experimental design To answer the above questions, the researchers designed a series of experiments covering three areas: news, actions, and international political figures. They used multiple LLM models and sent prompt statements of declared intentions to these models through APIs. Based on the responses of LLMs, the researchers divided them into three categories: praise (+1), neutral (0), and criticism (-1). In this way, the researchers were able to quantify LLMs' reactions to different intentions and further analyze the reasons behind these reactions. ### Main findings 1. **Trustworthiness is a key factor**: The study found that LLMs show more critical attitudes towards less - credible news sources, rather than simply based on ideological biases. 2. **Different models perform differently**: Some models (such as GPT - 3.5 - turbo) more frequently provide non - neutral responses, while other models (such as Claude - 3 - sonnet) are more inclined to remain neutral. 3. **Ideological bias is not absolute**: Although the preliminary analysis shows that LLMs have a negative tendency towards right - wing politics, this bias is not obvious after controlling factors such as credibility. ### Conclusion The researchers believe that as AI systems are more and more widely used in society, their praise, criticism, and neutral responses to users' intentions must be strictly monitored to avoid unintended psychological or social impacts. In addition, the study emphasizes that when evaluating the moral stances of LLMs, multiple factors need to be considered, not just simple ideological classifications. By focusing on LLMs' responses to users' intention statements, this paper introduces a new behavioral perspective to evaluate the moral stances of AI, providing important insights into its ethical and social consistency.