Abstract:Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML). We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including literature gaps and trends, human values, and interaction techniques. To pave the way for future studies, we envision three key challenges and give recommendations for future research.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to achieve bidirectional human - AI alignment in the context of the current rapid development of artificial intelligence (AI). Specifically, the paper focuses on the following aspects: 1. **Ambiguity in Definition and Scope**: Currently, there are unclear points regarding the definition and scope of human - AI alignment, which hinders the cooperative efforts in different research fields and makes it more difficult to achieve effective alignment. 2. **Limitations of One - Way Alignment**: Existing AI alignment research often regards it as a static, one - way process, that is, ensuring that the goals of AI systems match those of humans. This view ignores the influence of long - term interaction and dynamic changes and does not fully consider the possible evolution of human values and goals with the development of AI technology. 3. **Proposal of the Concept of Bidirectional Alignment**: To make up for the above deficiencies, the paper proposes the conceptual framework of "bidirectional human - AI alignment". This framework not only covers the traditional "aligning AI to humans" to ensure that AI can produce the expected results determined by humans, but also proposes a new concept - "aligning humans to AI", aiming to help individuals and society adapt to the development of AI in terms of cognition and behavior. 4. **Research Directions and Future Challenges**: Based on a systematic review of more than 400 related literatures, the paper identifies four key research questions (RQ1 - RQ4) and provides suggestions for the three main challenges faced by future research to promote interdisciplinary cooperation and drive the research progress of bidirectional human - AI alignment. Through these works, the paper aims to provide a comprehensive perspective for understanding the complex and dynamic interaction between humans and AI and to provide guidance for future research and development.

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

AI Alignment: A Comprehensive Survey

Beyond Preferences in AI Alignment

A Multi-Level Framework for the AI Alignment Problem

Adaptive AI Alignment: Established Resources for Aligning Machine Learning with Human Intentions and Values in Changing Environments

Aligning Artificial Intelligence with Humans through Public Policy

Artificial Intelligence, Values and Alignment

AI Alignment Dialogues: An Interactive Approach to AI Alignment in Support Agents

Methodological reflections for AI alignment research using human feedback

Enabling Human-Centered AI: A Methodological Perspective

Designing for Human-Agent Alignment: Understanding what humans want from their agents

AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations

Artificial Intelligence Value Alignment Principles: The State of Art Review from Information Systems Research

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

Concept Alignment

Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

Interactive AI Alignment: Specification, Process, and Evaluation Alignment

Challenges of Human-Aware AI Systems

AI Alignment with Changing and Influenceable Reward Functions

Foundational Moral Values for AI Alignment

There and Back Again: The AI Alignment Paradox