Fairness in Preference-based Reinforcement Learning

Umer Siddique,Abhinav Sinha,Yongcan Cao
2023-09-01
Abstract:In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in the presence of multiple objectives. The main objective is to design control policies that can optimize multiple objectives while treating each objective fairly. Toward this objective, we design a new fairness-induced preference-based reinforcement learning or FPbRL. The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences rather than reward-based preference in PbRL, coupled with policy learning via maximizing a generalized Gini welfare function. Finally, we provide experiment studies on three different environments to show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.
Machine Learning,Artificial Intelligence,Computers and Society,Systems and Control
What problem does this paper attempt to address?
This paper aims to address the issue of lack of fairness in Preference-based Reinforcement Learning (PbRL) when dealing with multiple objectives. Specifically: - **Research Background**: Traditional preference-based reinforcement learning methods mainly focus on maximizing a single performance metric, neglecting fairness among different objectives. This is particularly crucial in real-world tasks involving diverse user preferences. - **Contribution of the Paper**: The authors propose a new Fairness-Induced Preference-based Reinforcement Learning method (FPbRL), which introduces welfare function-based preferences to replace traditional reward functions. By combining the generalized Gini welfare function for policy optimization, the method achieves a balance between efficiency and fairness in multi-objective optimization. - **Experimental Validation**: The paper demonstrates the effectiveness of the proposed method through experiments in three different environments (species conservation, resource collection, and traffic control). The experimental results show that compared to traditional PPO and PbRL methods, FPbRL can maintain good learning performance while ensuring fairness among multiple objectives. In summary, the core of this paper is to propose a new method that effectively addresses the issue of multi-objective fairness in preference-based reinforcement learning, providing new ideas and technical means for research in related fields.