Robust Contextual Bandit via the Capped- Norm for Mobile Health Intervention

Feiyun Zhu,Xinliang Zhu,Sheng Wang,Jiawen Yao,Zhichun Xiao,Junzhou Huang
2018-01-01
Abstract:This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in the mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue of outliers, we are the first to propose a novel robust actor-critic contextual bandit method for the mHealth intervention. In the critic updating, the capped- norm is used to measure the approximation error, which prevents outliers from dominating our objective. A set of weights could be achieved from the critic updating. Considering them gives a weighted objective for the actor updating. It provides the ineffective sample in the critic updating with zero weights for the actor updating. As a result, the robustness of both …
What problem does this paper attempt to address?