Abstract:A prerequisite for social coordination is bidirectional communication between teammates, each playing two roles simultaneously: as receptive listeners and expressive speakers. For robots working with humans in complex situations with multiple goals that differ in importance, failure to fulfill the expectation of either role could undermine group performance due to misalignment of values between humans and robots. Specifically, a robot needs to serve as an effective listener to infer human users' intents from instructions and feedback and as an expressive speaker to explain its decision processes to users. Here, we investigate how to foster effective bidirectional human-robot communications in the context of value alignment—collaborative robots and users form an aligned understanding of the importance of possible task goals. We propose an explainable artificial intelligence (XAI) system in which a group of robots predicts users' values by taking in situ feedback into consideration while communicating their decision processes to users through explanations. To learn from human feedback, our XAI system integrates a cooperative communication model for inferring human values associated with multiple desirable goals. To be interpretable to humans, the system simulates human mental dynamics and predicts optimal explanations using graphical models. We conducted psychological experiments to examine the core components of the proposed computational framework. Our results show that real-time human-robot mutual understanding in complex cooperative tasks is achievable with a learning model based on bidirectional communication. We believe that this interaction framework can shed light on bidirectional value alignment in communicative XAI systems and, more broadly, in future human-machine teaming systems.

User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions

ValueCompass: A Framework of Fundamental Values for Human-AI Alignment

Democratizing Reward Design for Personal and Representative Value-Alignment

Towards an End-to-End Personal Fine-Tuning Framework for AI Value Alignment

AI Alignment Dialogues: An Interactive Approach to AI Alignment in Support Agents

Strong and weak alignment of large language models with human values

In situ bidirectional human-robot value alignment

Minion: A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications

The Challenge of Value Alignment: from Fairer Algorithms to AI Safety

What are human values, and how do we align AI to them?

Training Socially Aligned Language Models on Simulated Social Interactions

A Roadmap to Pluralistic Alignment

Exploring the Impact of AI Value Alignment in Collaborative Ideation: Effects on Perception, Ownership, and Output

Beyond Preferences in AI Alignment

The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

Human-Centered Design to Address Biases in Artificial Intelligence

Consumer responses to human-AI collaboration at organizational frontlines: strategies to escape algorithm aversion in content creation

Evaluating and Improving Value Judgments in AI: A Scenario-Based Study on Large Language Models' Depiction of Social Conventions

Human/AI relationships: challenges, downsides, and impacts on human/human relationships