Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

Nicholas Kluge Corrêa
2024-06-18
Abstract:The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence across all Humanities disciplines, revolves around the intricacies of morality and normativity. Surprisingly, in recent years, this thematic thread has woven its way into an unexpected domain, one not conventionally associated with pondering "what ought to be": the field of artificial intelligence (AI) research. Central to morality and AI, we find "alignment", a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects. More explicitly and with our current paradigm of AI development in mind, we can think of alignment as teaching human values to non-anthropomorphic entities trained through opaque, gradient-based learning techniques. This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development. To accomplish this, we propose two sets of necessary and sufficient conditions that, we argue, should be considered in any alignment process. While necessary conditions serve as metaphysical and metaethical roots that pertain to the permissibility of alignment, sufficient conditions establish a blueprint for aligning AI systems under a learning-based paradigm. After laying such foundations, we present implementations of this approach by using state-of-the-art techniques and methods for aligning general-purpose language systems. We call this framework Dynamic Normativity. Its central thesis is that any alignment process under a learning paradigm that cannot fulfill its necessary and sufficient conditions will fail in producing aligned systems.
Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The core issue that this paper attempts to address is the alignment problem between artificial intelligence systems and human values. Specifically, the paper explores how to express human goals and values in a way that does not lead to adverse side effects, enabling non-humanoid AI systems trained through opaque gradient learning techniques to adhere to these values. The authors propose the framework of "Dynamic Normativity," which aims to provide the necessary philosophical foundation and technical implementation for achieving value alignment in AI. The paper mentions that value alignment is not only a technical issue but also a philosophical one, requiring a solid philosophical foundation to guide practice. To achieve this, the authors propose two sets of necessary and sufficient conditions, arguing that any alignment process should consider these conditions. The necessary conditions involve the metaphysical and meta-ethical foundations of the permissibility of alignment, while the sufficient conditions provide a blueprint for aligning AI systems based on learning paradigms. In this way, the paper attempts to offer a framework that combines theory and practice for developing AI systems that conform to human values.