Doing the right thing for the right reason: Evaluating artificial moral cognition by probing cost insensitivity

Yiran Mao,Madeline G. Reinecke,Markus Kunesch,Edgar A. Duéñez-Guzmán,Ramona Comanescu,Julia Haas,Joel Z. Leibo
2023-05-30
Abstract:Is it possible to evaluate the moral cognition of complex artificial agents? In this work, we take a look at one aspect of morality: `doing the right thing for the right reasons.' We propose a behavior-based analysis of artificial moral cognition which could also be applied to humans to facilitate like-for-like comparison. Morally-motivated behavior should persist despite mounting cost; by measuring an agent's sensitivity to this cost, we gain deeper insight into underlying motivations. We apply this evaluation to a particular set of deep reinforcement learning agents, trained by memory-based meta-reinforcement learning. Our results indicate that agents trained with a reward function that includes other-regarding preferences perform helping behavior in a way that is less sensitive to increasing cost than agents trained with more self-interested preferences.
Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The problem this paper attempts to address is how to evaluate the moral cognitive abilities of complex artificial agents (such as deep reinforcement learning models). Specifically, the authors focus on how to assess whether these agents perform moral actions for the right reasons through behavioral analysis, i.e., "doing the right thing for the right reasons." To achieve this goal, the authors propose a behavior-based evaluation method that is applicable not only to artificial agents but also to humans for comparative analysis. ### Main Contributions of the Paper: 1. **Behavioral Evaluation Framework**: A behavioral evaluation framework is proposed to assess the moral cognition of artificial agents. The core of this framework is to measure the changes in agents' behavior when faced with increasing costs, thereby inferring the motivations behind their actions. 2. **Experimental Design**: A simulated environment is designed to evaluate agents' behavioral changes by altering costs (e.g., increasing the distance for helping behavior). This method can be used to distinguish between intrinsic motivations (such as moral motivations) and instrumental motivations (such as self-interested motivations). 3. **Empirical Study**: By training and evaluating three deep reinforcement learning agents with different other-regarding preference coefficients, the study demonstrates the behavioral differences of these agents when costs increase. The results show that agents with stronger other-regarding preferences (i.e., more concerned with others' welfare) exhibit lower cost sensitivity when costs increase, indicating that they are more likely to engage in helping behavior out of moral motivation. ### Main Conclusions: - **Cost Sensitivity**: Agents with stronger other-regarding preferences (i.e., more concerned with others' welfare) exhibit lower cost sensitivity when costs increase, meaning they are more likely to engage in helping behavior out of moral motivation. - **Moral Behavior Evaluation**: By comparing the behavioral changes of different agents when costs increase, their moral behavior can be assessed. Specifically, if one agent continues to help despite increasing costs while another quickly gives up, the former can be considered more moral. ### Experimental Method: - **Simulated Environment**: A 2D simulated environment is designed, where there are two agents (a tall agent and a short agent). The tall agent can help the short agent obtain fruit that it cannot reach on its own. - **Cost Manipulation**: Costs are manipulated by changing the distance the tall agent needs to deviate from its optimal path, thereby evaluating changes in its helping behavior. - **Evaluation Metrics**: The number of helping events by each agent under different cost conditions is recorded to assess their cost sensitivity. ### Discussion: - **Evaluation of Morally Irrelevant Behavior**: In addition to evaluating the cost sensitivity of morally relevant behaviors (such as helping behavior), it is also necessary to evaluate the cost sensitivity of morally irrelevant behaviors to ensure that changes in agents' behavior are not due to general behavioral rigidity. - **Future Work**: Future research can further expand the experimental design to include more types of morally relevant and irrelevant behaviors to more comprehensively assess the moral cognitive abilities of artificial agents. In summary, this paper proposes a method for evaluating the moral cognition of artificial agents through the design and implementation of a series of experiments and demonstrates the effectiveness of this method. This is significant for understanding and developing artificial intelligence systems with moral cognitive abilities.