Leading strategies in competitive on-line prediction

Vladimir Vovk
DOI: https://doi.org/10.48550/arXiv.cs/0607134
2006-07-28
Abstract:We start from a simple asymptotic result for the problem of on-line regression with the quadratic loss function: the class of continuous limited-memory prediction strategies admits a "leading prediction strategy", which not only asymptotically performs at least as well as any continuous limited-memory strategy but also satisfies the property that the excess loss of any continuous limited-memory strategy is determined by how closely it imitates the leading strategy. More specifically, for any class of prediction strategies constituting a reproducing kernel Hilbert space we construct a leading strategy, in the sense that the loss of any prediction strategy whose norm is not too large is determined by how closely it imitates the leading strategy. This result is extended to the loss functions given by Bregman divergences and by strictly proper scoring rules.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the competitive strategy problem in online prediction. Specifically: 1. **Asymptotic Results in Online Regression**: The paper first starts from a simple asymptotic result and discusses the online regression problem with a squared - loss function. For the class of continuous finite - memory prediction strategies, there exists a "leading prediction strategy", which not only asymptotically performs at least as well as any continuous finite - memory strategy, but also the additional loss of other strategies depends on the degree to which they imitate the leading strategy. 2. **Leading Strategy in Reproducing Kernel Hilbert Space (RKHS)**: For the class of prediction strategies that form a reproducing kernel Hilbert space, the paper constructs a leading strategy such that the loss of any prediction strategy with not - too - large norm is determined by the degree to which it imitates the leading strategy. 3. **Extension to a Wider Range of Loss Functions**: This result is generalized to loss functions given by Bregman divergence and strictly proper scoring rules. 4. **Competitiveness in Online Prediction**: The paper emphasizes that online prediction usually avoids making any stochastic assumptions about how the observations are generated, but in some cases also considers randomly generated observations. 5. **Application of Defensive Prediction**: The paper uses the defensive prediction method to construct master strategies, which automatically satisfy the stronger properties required by the leading strategy. 6. **Relationships among the Successes of Different Prediction Strategies**: The paper also explores the relationships among successful prediction strategies, especially in the form of Jeffreys's law, indicating that successful prediction strategies will tend to converge. In summary, the core problem of this paper is to construct and analyze leading strategies that can outperform or approach the performance of the optimal prediction strategy and extend their application range to a wider range of prediction scenarios and loss functions. This helps to improve the accuracy and robustness of online prediction, especially when not making too many assumptions about the data - generation mechanism. ### Key Formulas - Squared - loss function: \[ \lambda(y, \mu)=(y - \mu)^2 \] - Bregman divergence: \[ d_{\Psi, \Psi'}(y, z):=\Psi(y)-\Psi(z)-\Psi'(z)(y - z) \] - Relative entropy (Kullback - Leibler divergence): \[ D(y \| z):=y \ln\frac{y}{z}+(1 - y)\ln\frac{1 - y}{1 - z} \] - Key inequality in Jeffreys's law: \[ \left|\sum_{n = 1}^N\lambda(y_n, \mu_n)+\sum_{n = 1}^N d_\lambda(\mu_n, \phi_n)-\sum_{n = 1}^N\lambda(y_n, \phi_n)\right| \leq\sqrt{c_F^2 + 1}\left(\left\|\text{Exp}_\lambda(F)\right\|_F+\left\|\text{Exp}_\lambda\right\|_{C(P)}\right)\sqrt{N} \] These formulas are used in the paper to describe the performance and performance evaluation of prediction strategies.