Abstract:Traditional generalization results in statistical learning require a training data set made of independently drawn examples. Most of the recent efforts to relax this independence assumption have considered either purely temporal (mixing) dependencies, or graph-dependencies, where non-adjacent vertices correspond to independent random variables. Both approaches have their own limitations, the former requiring a temporal ordered structure, and the latter lacking a way to quantify the strength of inter-dependencies. In this work, we bridge these two lines of work by proposing a framework where dependencies decay with graph distance. We derive generalization bounds leveraging the online-to-PAC framework, by deriving a concentration result and introducing an online learning framework incorporating the graph structure. The resulting high-probability generalization guarantees depend on both the mixing rate and the graph's chromatic number.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in statistical learning, when the samples in the training data set are not independent and identically distributed (i.i.d.) but have a dependency relationship, how to derive the generalization bound. Specifically, the author focuses on the graph - mixing dependencies, that is, the dependency relationship weakens as the distance between nodes in the graph increases. ### Problem Background Traditional generalization results usually assume that the samples in the training data set are independent and identically distributed (i.i.d.). However, in many practical applications, this assumption does not hold. For example: - In housing price prediction, the prices of neighboring houses will influence each other. - In social networks, connected users are more likely to hold similar views. There are dependency relationships between data points in these application scenarios, and the strength of the dependency weakens as the distance between nodes in the graph increases. Existing research mainly focuses on two situations: 1. **Temporal mixing dependency**: The dependency relationship weakens as the time interval increases, but it requires that the data has a clear time sequence. 2. **Graph dependency**: The dependency relationship is described through the graph structure, but there is a lack of methods to quantify the strength of the dependency. ### Core Contributions of the Paper This paper proposes a new framework that combines the dependencies of time and graph structure, so that the strength of the dependency can be quantified by the distance of the graph. Specifically, the author introduces a new online learning framework and uses this framework to derive the generalization bound. The probability guarantees of these generalization bounds depend on the mixing rate and the chromatic number of the graph. ### Main Technical Means 1. **(G, φ)-mixing process**: Defines a dependency structure in which the strength of the dependency weakens as the distance between nodes in the graph increases. 2. **Online - to - PAC conversion**: Through the regret analysis tool in online learning, the generalization problem is transformed into an online learning problem, and the generalization bound is further derived. 3. **Sequential learning on graphs**: Defines a new class of online learning games, ensuring that players can only use the information of "sufficiently far" nodes to choose actions, thereby simulating graph dependencies. ### Specific Problem Description Suppose we have a training data set \( S_n=(Z_1,\dots,Z_n) \), where each \( Z_i \) comes from a distribution \( \mu_n \), and the marginal distribution of each \( Z_i \) is the same as \( \mu \). We assume that there is a graph \( G \) and a bijection \( \iota: G\rightarrow [n] \), and a non - negative decreasing sequence \( \phi = (\phi_d)_{d > 0} \), such that for any hypothesis \( w\in W \), the graph - labeled process \( X_G(w)=(X_v(w))_{v\in V(G)} \) is a (G, φ)-mixing process, where: \[ X_v(w)=L(w)-\ell(w,Z_{\iota(v)}) \] Here, \( L(w) \) represents the overall loss, and \( \ell(w,Z_{\iota(v)}) \) represents the loss on the instance \( Z_{\iota(v)} \). ### Generalization Bound Based on the above assumptions, the author derives the following generalization bound: \[ L(\hat{P}_n)\leq\hat{L}_n(\hat{P}_n)+\min_{d = 1,\dots,n}\left(\phi_d+\sqrt{\frac{\Delta^2\chi_f^{(d)}}{2n\log\frac{1}{\delta}}}\right) \] where: - \( L(\hat{P}_n) \) is the expected overall loss, - \( \hat{L}_n(\hat{P}_n) \) is the empirical loss, - \( \phi_d \) is the dependency attenuation coefficient, - \( \Delta \) is the range of loss values, - \( \chi_f^{(d)} \) is the fractional d - chromatic number of the graph, - \( \delta \) is the confidence level.

Online-to-PAC generalization bounds under graph-mixing dependencies

Generalization bounds for mixing processes via delayed online-to-PAC conversions

Online-to-PAC Conversions: Generalization Bounds via Regret Analysis

Generalization bounds for learning under graph-dependence: a survey

Generalization in Graph Neural Networks: Improved PAC-Bayesian Bounds on Graph Diffusion

PAC-Chernoff Bounds: Understanding Generalization in the Interpolation Regime

Generalization Bounds for Dependent Data using Online-to-Batch Conversion

Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures

Higher-Order Generalization Bounds: Learning Deep Probabilistic Programs via PAC-Bayes Objectives

Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets

AGMixup: Adaptive Graph Mixup for Semi-supervised Node Classification

A unified framework for information-theoretic generalization bounds

A General Framework for the Practical Disintegration of PAC-Bayesian Bounds

On Certified Generalization in Structured Prediction

PAC-Bayesian Adversarially Robust Generalization Bounds for Graph Neural Network

Tighter Generalisation Bounds via Interpolation

High-arity PAC learning via exchangeability

Generalization Bounds for Message Passing Networks on Mixture of Graphons

Covered Forest: Fine-grained generalization analysis of graph neural networks

An Information-Theoretic Approach to Generalization Theory