Ethical Alignment Decision-Making for Connected Autonomous Vehicle in Traffic Dilemmas Via Reinforcement Learning from Human Feedback
Xin Gao,Tian Luan,Xueyuan Li,Qi Liu,Zhaoyang Ma,Xiaoqiang Meng,Zirui Li
DOI: https://doi.org/10.1109/jiot.2024.3447070
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:Since the introduction of the trolley problem, the ethical decision-making conundrum has evolved from autonomous vehicles (AVs) to connected autonomous vehicles (CAVs), continuing as a prominent challenge. When confronted with ethical dilemmas, CAVs must align their responses not merely with value-neutral human preferences, but also with broader moral and ethical frameworks. Consequently, to ensure that CAVs do not engage in actions that contravene established human moral principles, it is imperative that ethical considerations are meticulously integrated into their decision-making systems. In this paper, we introduce an innovative Multi-scale Multi-modal Ethical Network (M2ENet), which aims to align the autonomous vehicle decision-making system with human ethical feedback in ethical dilemma scenarios. Firstly, we extract morphological and dynamic features from sensory information and signal data, respectively, using Multi-scale Multi-modal Representation. Additionally, Ethical Policy-based Network is devised to enable autonomous vehicles to comprehend ethical information, which includes the introduction of an ethical alignment factor to ethically align the feature matrix from human feedback. Furthermore, the accuracy of ethical interaction information is improved through coupled ethical module informed by human feedback. Finally, the efficacy of the system is demonstrated through three representative ethical dilemmas in traffic scenarios, employing both simulation experiments and hardware-in-the-loop testing. The simulation experiments reveal that our proposed model can generate decision-making strategies more aligned with human preferences in ethical traffic scenarios. In addition, in our hardware-in-the-loop tests, it is observed that the average percentage of ethical bias weights decreases by 45.06% after 150 episodes of training.