Abstract:In linear bandits, how can a learner effectively learn when facing corrupted rewards? While significant work has explored this question, a holistic understanding across different adversarial models and corruption measures is lacking, as is a full characterization of the minimax regret bounds. In this work, we compare two types of corruptions commonly considered: strong corruption, where the corruption level depends on the action chosen by the learner, and weak corruption, where the corruption level does not depend on the action chosen by the learner. We provide a unified framework to analyze these corruptions. For stochastic linear bandits, we fully characterize the gap between the minimax regret under strong and weak corruptions. We also initiate the study of corrupted adversarial linear bandits, obtaining upper and lower bounds with matching dependencies on the corruption level. Next, we reveal a connection between corruption-robust learning and learning with gap-dependent mis-specification, a setting first studied by Liu et al. (2023a), where the misspecification level of an action or policy is proportional to its suboptimality. We present a general reduction that enables any corruption-robust algorithm to handle gap-dependent misspecification. This allows us to recover the results of Liu et al. (2023a) in a black-box manner and significantly generalize them to settings like linear MDPs, yielding the first results for gap-dependent misspecification in reinforcement learning. However, this general reduction does not attain the optimal rate for gap-dependent misspecification. Motivated by this, we develop a specialized algorithm that achieves optimal bounds for gap-dependent misspecification in linear bandits, thus answering an open question posed by Liu et al. (2023a).

On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

Corruption-Robust Offline Reinforcement Learning with General Function Approximation

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions

Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Corruption Robust Offline Reinforcement Learning with Human Feedback

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback

Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions

Robust Q-Learning under Corrupted Rewards

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Cooperative Online Learning in Stochastic and Adversarial MDPs

Scale-free Adversarial Reinforcement Learning

Stochastic Graphical Bandits with Adversarial Corruptions.

Dynamic Regret of Adversarial MDPs with Unknown Transition and Linear Function Approximation

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Time-Constrained Robust MDPs

Robustifying Reinforcement Learning Agents via Action Space Adversarial Training