Online-to-PAC generalization bounds under graph-mixing dependencies

Baptiste Abélès,Eugenio Clerico,Gergely Neu
2024-10-12
Abstract:Traditional generalization results in statistical learning require a training data set made of independently drawn examples. Most of the recent efforts to relax this independence assumption have considered either purely temporal (mixing) dependencies, or graph-dependencies, where non-adjacent vertices correspond to independent random variables. Both approaches have their own limitations, the former requiring a temporal ordered structure, and the latter lacking a way to quantify the strength of inter-dependencies. In this work, we bridge these two lines of work by proposing a framework where dependencies decay with graph distance. We derive generalization bounds leveraging the online-to-PAC framework, by deriving a concentration result and introducing an online learning framework incorporating the graph structure. The resulting high-probability generalization guarantees depend on both the mixing rate and the graph's chromatic number.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in statistical learning, when the samples in the training data set are not independent and identically distributed (i.i.d.) but have a dependency relationship, how to derive the generalization bound. Specifically, the author focuses on the graph - mixing dependencies, that is, the dependency relationship weakens as the distance between nodes in the graph increases. ### Problem Background Traditional generalization results usually assume that the samples in the training data set are independent and identically distributed (i.i.d.). However, in many practical applications, this assumption does not hold. For example: - In housing price prediction, the prices of neighboring houses will influence each other. - In social networks, connected users are more likely to hold similar views. There are dependency relationships between data points in these application scenarios, and the strength of the dependency weakens as the distance between nodes in the graph increases. Existing research mainly focuses on two situations: 1. **Temporal mixing dependency**: The dependency relationship weakens as the time interval increases, but it requires that the data has a clear time sequence. 2. **Graph dependency**: The dependency relationship is described through the graph structure, but there is a lack of methods to quantify the strength of the dependency. ### Core Contributions of the Paper This paper proposes a new framework that combines the dependencies of time and graph structure, so that the strength of the dependency can be quantified by the distance of the graph. Specifically, the author introduces a new online learning framework and uses this framework to derive the generalization bound. The probability guarantees of these generalization bounds depend on the mixing rate and the chromatic number of the graph. ### Main Technical Means 1. **(G, φ)-mixing process**: Defines a dependency structure in which the strength of the dependency weakens as the distance between nodes in the graph increases. 2. **Online - to - PAC conversion**: Through the regret analysis tool in online learning, the generalization problem is transformed into an online learning problem, and the generalization bound is further derived. 3. **Sequential learning on graphs**: Defines a new class of online learning games, ensuring that players can only use the information of "sufficiently far" nodes to choose actions, thereby simulating graph dependencies. ### Specific Problem Description Suppose we have a training data set \( S_n=(Z_1,\dots,Z_n) \), where each \( Z_i \) comes from a distribution \( \mu_n \), and the marginal distribution of each \( Z_i \) is the same as \( \mu \). We assume that there is a graph \( G \) and a bijection \( \iota: G\rightarrow [n] \), and a non - negative decreasing sequence \( \phi = (\phi_d)_{d > 0} \), such that for any hypothesis \( w\in W \), the graph - labeled process \( X_G(w)=(X_v(w))_{v\in V(G)} \) is a (G, φ)-mixing process, where: \[ X_v(w)=L(w)-\ell(w,Z_{\iota(v)}) \] Here, \( L(w) \) represents the overall loss, and \( \ell(w,Z_{\iota(v)}) \) represents the loss on the instance \( Z_{\iota(v)} \). ### Generalization Bound Based on the above assumptions, the author derives the following generalization bound: \[ L(\hat{P}_n)\leq\hat{L}_n(\hat{P}_n)+\min_{d = 1,\dots,n}\left(\phi_d+\sqrt{\frac{\Delta^2\chi_f^{(d)}}{2n\log\frac{1}{\delta}}}\right) \] where: - \( L(\hat{P}_n) \) is the expected overall loss, - \( \hat{L}_n(\hat{P}_n) \) is the empirical loss, - \( \phi_d \) is the dependency attenuation coefficient, - \( \Delta \) is the range of loss values, - \( \chi_f^{(d)} \) is the fractional d - chromatic number of the graph, - \( \delta \) is the confidence level.