Abstract:In linear bandits, how can a learner effectively learn when facing corrupted rewards? While significant work has explored this question, a holistic understanding across different adversarial models and corruption measures is lacking, as is a full characterization of the minimax regret bounds. In this work, we compare two types of corruptions commonly considered: strong corruption, where the corruption level depends on the action chosen by the learner, and weak corruption, where the corruption level does not depend on the action chosen by the learner. We provide a unified framework to analyze these corruptions. For stochastic linear bandits, we fully characterize the gap between the minimax regret under strong and weak corruptions. We also initiate the study of corrupted adversarial linear bandits, obtaining upper and lower bounds with matching dependencies on the corruption level. Next, we reveal a connection between corruption-robust learning and learning with gap-dependent mis-specification, a setting first studied by Liu et al. (2023a), where the misspecification level of an action or policy is proportional to its suboptimality. We present a general reduction that enables any corruption-robust algorithm to handle gap-dependent misspecification. This allows us to recover the results of Liu et al. (2023a) in a black-box manner and significantly generalize them to settings like linear MDPs, yielding the first results for gap-dependent misspecification in reinforcement learning. However, this general reduction does not attain the optimal rate for gap-dependent misspecification. Motivated by this, we develop a specialized algorithm that achieves optimal bounds for gap-dependent misspecification in linear bandits, thus answering an open question posed by Liu et al. (2023a).

Corruption Robust Active Learning

Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification

Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions

Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions

Robust Q-Learning under Corrupted Rewards

Effective and Robust Adversarial Training against Data and Label Corruptions

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

Adaptive Robust Learning using Latent Bernoulli Variables

Corruption-Robust Offline Reinforcement Learning with General Function Approximation

Corruption-Robust Lipschitz Contextual Search

On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Optimal Robust Estimation under Local and Global Corruptions: Stronger Adversary and Smaller Error

Investigating the Corruption Robustness of Image Classifiers with Random Lp-norm Corruptions

Manipulating hidden-Markov-model inferences by corrupting batch data

Stochastic Graphical Bandits with Adversarial Corruptions.

Robust Distribution Learning with Local and Global Adversarial Corruptions

Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation

Multiclass Learning with Partially Corrupted Labels.

Online Corrupted User Detection and Regret Minimization.