Abstract:The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence across all Humanities disciplines, revolves around the intricacies of morality and normativity. Surprisingly, in recent years, this thematic thread has woven its way into an unexpected domain, one not conventionally associated with pondering "what ought to be": the field of artificial intelligence (AI) research. Central to morality and AI, we find "alignment", a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects. More explicitly and with our current paradigm of AI development in mind, we can think of alignment as teaching human values to non-anthropomorphic entities trained through opaque, gradient-based learning techniques. This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development. To accomplish this, we propose two sets of necessary and sufficient conditions that, we argue, should be considered in any alignment process. While necessary conditions serve as metaphysical and metaethical roots that pertain to the permissibility of alignment, sufficient conditions establish a blueprint for aligning AI systems under a learning-based paradigm. After laying such foundations, we present implementations of this approach by using state-of-the-art techniques and methods for aligning general-purpose language systems. We call this framework Dynamic Normativity. Its central thesis is that any alignment process under a learning paradigm that cannot fulfill its necessary and sufficient conditions will fail in producing aligned systems.

Learning Norms from Stories: A Prior for Value Aligned Agents

Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior

Aligning to Social Norms and Values in Interactive Narratives

Machine Learning Approaches for Principle Prediction in Naturally Occurring Stories

Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences

STELA: a community-centred approach to norm elicitation for AI alignment

Legible Normativity for AI Alignment: The Value of Silly Rules

Learning Human-like Representations to Enable Learning Human Values

Value alignment: a formal approach

Pragmatic-Pedagogic Value Alignment

Norm Learning, Teaching, and Change

Agent Alignment in Evolving Social Norms

In Conversation with Artificial Intelligence: Aligning language Models with Human Values

Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

Multi-Value Alignment in Normative Multi-Agent System: Evolutionary Optimisation Approach

Norm Violation Detection in Multi-Agent Systems using Large Language Models: A Pilot Study

Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

Moral Alignment for LLM Agents

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Contextual Moral Value Alignment Through Context-Based Aggregation

Culturally-Attuned Moral Machines: Implicit Learning of Human Value Systems by AI through Inverse Reinforcement Learning