Risk-Averse Biased Human Policies in Assistive Multi-Armed Bandit Settings

Michael Koller,Timothy Patten,Markus Vincze
DOI: https://doi.org/10.48550/arXiv.2104.05334
IF: 3.7
2021-04-12
Robotics
Abstract:Assistive multi-armed bandit problems can be used to model team situations between a human and an autonomous system like a domestic service robot. To account for human biases such as the risk-aversion described in the Cumulative Prospect Theory, the setting is expanded to using observable rewards. When robots leverage knowledge about the risk-averse human model they eliminate the bias and make more rational choices. We present an algorithm that increases the utility value of such human-robot teams. A brief evaluation indicates that arbitrary reward functions can be handled.
What problem does this paper attempt to address?