Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

Hadassah Harland,Richard Dazeley,Peter Vamplew,Hashini Senaratne,Bahareh Nakisa,Francisco Cruz

2024-10-31

Abstract:Emerging research in Pluralistic Artificial Intelligence (AI) alignment seeks to address how intelligent systems can be designed and deployed in accordance with diverse human needs and values. We contribute to this pursuit with a dynamic approach for aligning AI with diverse and shifting user preferences through Multi Objective Reinforcement Learning (MORL), via post-learning policy selection adjustment. In this paper, we introduce the proposed framework for this approach, outline its anticipated advantages and assumptions, and discuss technical details about the implementation. We also examine the broader implications of adopting a retroactive alignment approach through the sociotechnical systems perspective.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to achieve alignment adaptable to diverse and constantly changing user preferences in a Pluralistic AI system. Specifically, the researchers propose a dynamic adjustment method based on Multi - Objective Reinforcement Learning (MORL), continuously realigning user preferences through post - learning policy selection adjustment. This method aims to overcome the limitations of existing AI alignment methods, especially the fact that these methods usually assume that user preferences are static and cannot well adapt to the diversity and variability of user needs and values. The main contribution of the paper lies in providing a framework that can dynamically adjust its behavior during the operation of the AI system to better conform to the user's current preferences without requiring direct and specific feedback from the user. This not only improves the flexibility and adaptability of the system but also reduces the need for frequent user interactions, thereby alleviating the user's burden. In addition, through a continuous learning and self - review process, the system can continuously optimize its understanding and response to user preferences over time, thereby achieving a more long - lasting alignment state. In summary, the core issue of this paper is to explore an effective method to enable AI systems to dynamically and adaptively keep in line with users' diverse and dynamic preferences, thereby enhancing the practicality of AI systems and the user experience.

Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment

A Roadmap to Pluralistic Alignment

AI Alignment with Changing and Influenceable Reward Functions

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Beyond Preferences in AI Alignment

Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI

Social Choice for AI Alignment: Dealing with Diverse Human Feedback

Multi-objective Reinforcement learning from AI Feedback

Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Pareto-Optimal Learning from Preferences with Hidden Context

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

Dynamic value alignment through preference aggregation of multiple objectives

Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

AI, Pluralism, and (Social) Compensation

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

Instilling moral value alignment by means of multi-objective reinforcement learning