Abstract:We investigate the concept of algorithmic replicability introduced by Impagliazzo et al. 2022, Ghazi et al. 2021, Ahn et al. 2024 in an online setting. In our model, the input sequence received by the online learner is generated from time-varying distributions chosen by an adversary (obliviously). Our objective is to design low-regret online algorithms that, with high probability, produce the exact same sequence of actions when run on two independently sampled input sequences generated as described above. We refer to such algorithms as adversarially replicable. Previous works (such as Esfandiari et al. 2022) explored replicability in the online setting under inputs generated independently from a fixed distribution; we term this notion as iid-replicability. Our model generalizes to capture both adversarial and iid input sequences, as well as their mixtures, which can be modeled by setting certain distributions as point-masses. We demonstrate adversarially replicable online learning algorithms for online linear optimization and the experts problem that achieve sub-linear regret. Additionally, we propose a general framework for converting an online learner into an adversarially replicable one within our setting, bounding the new regret in terms of the original algorithm's regret. We also present a nearly optimal (in terms of regret) iid-replicable online algorithm for the experts problem, highlighting the distinction between the iid and adversarial notions of replicability. Finally, we establish lower bounds on the regret (in terms of the replicability parameter and time) that any replicable online algorithm must incur.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is the replicability problem of online learning algorithms, especially in the adversarial setting. Specifically, the goal of the paper is to design low - regret online learning algorithms that can produce the same sequence of actions when facing input sequences generated by time - varying distributions selected by opponents. These algorithms are called adversarially replicable algorithms. ### Detailed Interpretation 1. **Background and Motivation**: - **Replicability Crisis**: There is a widespread replicability crisis in the scientific field, especially in the AI field, which affects the reliability and integrity of multiple disciplines. - **Existing Research**: Previous work has mainly focused on replicability under independent and identically distributed (i.i.d.) inputs, while this paper extends to replicability in the adversarial environment. 2. **Problem Definition**: - **Adversarial Replicability**: In the adversarial environment, the input sequence is generated by a time - varying distribution selected by the opponent. The goal is to design an algorithm that, when run on two independently sampled input sequences, can produce exactly the same sequence of actions with high probability and has low regret. - **Mathematical Representation**: For an online learning algorithm \( \text{ALG} \), if when run on two independently sampled input sequences \( S \) and \( S' \), using the same internal randomness \( R \), it produces the same sequence of actions, then the algorithm is called \( \rho \)-replicable. Formally, the following condition is satisfied: \[ \Pr_{S \leftarrow D^{\otimes T}, S' \leftarrow D^{\otimes T}, R} \left[ \forall t: a_t(S, R) = a_t(S', R) \right] \geq 1 - \rho \] 3. **Main Contributions**: - **Linear Optimization Problem**: Proposed an adversarially \( \rho \)-replicable online linear optimization algorithm, which has sub - linear regret. - **Expert Problem**: Proposed an adversarially \( \rho \)-replicable expert problem algorithm, which also has sub - linear regret. - **General Framework**: Provided a general framework for converting ordinary online learning algorithms into adversarially \( \rho \)-replicable algorithms and gave an upper bound on regret. - **i.i.d. Replicability**: Designed an i.i.d. \( \rho \)-replicable algorithm for the expert problem, with the optimal worst - case regret. - **Lower Bound Analysis**: Proved the lower bound of regret that any adversarially or i.i.d. \( \rho \)-replicable algorithm must bear. 4. **Key Technologies**: - **Time Blocking**: Divide the time range into fixed - size blocks and update actions only at the end of each block. - **Meshing Cumulative Cost Vectors**: At the end of each block, round the cumulative cost vector to the nearest point in a random grid. - **Geometric Noise**: In the expert problem, add geometric noise to keep regret low and achieve replicability. 5. **Conclusions and Future Work**: - The paper shows how to achieve replicability in the adversarial environment and provides theoretical guarantees and algorithm designs. - Future research directions include extending these techniques to partial - information settings (such as the multi - armed bandit problem) and other fields (such as clustering and reinforcement learning). Through these contributions, the paper provides new perspectives and methods for the study of replicability of online learning algorithms, especially for applications in the adversarial environment.

Replicable Online Learning

On the Computational Landscape of Replicable Learning

Replicability and stability in learning

Replicability in Reinforcement Learning

Replicability is Asymptotically Free in Multi-armed Bandits

Reproducibility in Learning

Replicability in High Dimensional Statistics

The Interplay Between Stability and Regret in Online Learning

Replicable Learning of Large-Margin Halfspaces

List and Certificate Complexities in Replicable Learning

Selective Sampling and Imitation Learning via Online Regression

Online Learning: Sufficient Statistics and the Burkholder Method

Online Learning: Stochastic and Constrained Adversaries

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

Online Learning with Unknown Constraints

Adversaries in Online Learning Revisited: with applications in Robust Optimization and Adversarial training

Byzantine-Robust Distributed Online Learning: Taming Adversarial Participants in An Adversarial Environment

Online Learning with Bounded Recall

Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony

LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization