On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Tianhao Wu,Yunchang Yang,Simon Du,Liwei Wang
2021-01-01
Abstract:We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (CR-MVP), which achieves a regret bound of (O) over tilde ((root SAK + S(2)A + CSA)) polylog(H)), where S is the number of states, A is the number of actions, H is the planning horizon, K is the number of episodes, and C is the known corruption level. We also provide a novel lower bound, which indicates that our upper bound is nearly tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a root K-type regret in this setting and is oracle efficient.
What problem does this paper attempt to address?