Moral Gridworlds: A Theoretical Proposal for Modeling Artificial Moral Cognition

Julia Haas
DOI: https://doi.org/10.1007/s11023-020-09524-9
IF: 5.339
2020-04-25
Minds and Machines
Abstract:I describe a suite of reinforcement learning environments in which artificial agents learn to value and respond to moral content and contexts. I illustrate the core principles of the framework by characterizing one such environment, or "gridworld," in which an agent learns to trade-off between monetary profit and fair dealing, as applied in a standard behavioral economic paradigm. I then highlight the core technical and philosophical advantages of the learning approach for modeling moral cognition, and for addressing the so-called value alignment problem in AI.
computer science, artificial intelligence
What problem does this paper attempt to address?