Data Augmentation vs. Equivariant Networks: A Theory of Generalization on Dynamics Forecasting

Rui Wang,Robin Walters,Rose Yu
DOI: https://doi.org/10.48550/arXiv.2206.09450
2022-06-20
Abstract:Exploiting symmetry in dynamical systems is a powerful way to improve the generalization of deep learning. The model learns to be invariant to transformation and hence is more robust to distribution shift. Data augmentation and equivariant networks are two major approaches to injecting symmetry into learning. However, their exact role in improving generalization is not well understood. In this work, we derive the generalization bounds for data augmentation and equivariant networks, characterizing their effect on learning in a unified framework. Unlike most prior theories for the i.i.d. setting, we focus on non-stationary dynamics forecasting with complex temporal dependencies.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand how data augmentation and equivariant networks can improve the generalization ability of deep learning models by exploiting symmetry in the prediction tasks of non - stationary dynamical systems. Specifically, the author focuses on the performance of these two methods on non - stationary time - series data and attempts to answer the following key questions: 1. **The roles of data augmentation and equivariant networks**: How do the two improve the generalization ability of the model by injecting symmetry? What are their relative advantages under different conditions? 2. **Lack of theoretical analysis**: Although there have been a large number of empirical studies on data augmentation and equivariant networks, there is a lack of theoretical explanations and comparisons of their behaviors. 3. **Generalization bounds**: What are the generalization bounds of data augmentation and equivariant networks in non - stationary and non - mixing time - series data? ### Main contributions of the paper 1. **Formal description of symmetry in dynamic prediction**: The author assumes that the underlying dynamical system retains a certain amount of symmetry and formalizes the dynamic prediction problem on this basis. 2. **Derivation of generalization bounds**: The author derives generalization bounds for data augmentation and equivariant networks (including strictly equivariant and approximately equivariant networks), which are applicable to non - stationary and non - mixing time - series data. 3. **Proof of the advantages of equivariant networks**: When the underlying dynamical system is symmetric, the generalization bound of the strictly equivariant network is tighter than that of data augmentation. When there is only approximate symmetry in the data, the generalization bound of the approximately equivariant network is further improved. ### Key formulas - **Sequential Rademacher complexity**: \[ R_{\text{sq}}^T(G)=\mathbb{E}_z\mathbb{E}_\sigma\left[\sup_{g\in G}\sum_{t = 1}^T\sigma_tq_tg(z_t(\sigma))\right] \] - **Equivariant error**: \[ \|f\|_{\text{EE}}=\sup_{x, g}\|f(\rho_{\text{in}}(g)(x))-\rho_{\text{out}}(g)f(x)\| \] - **Generalization bound theorem**: \[ \mathbb{E}[L(\hat{\theta}, Z_{T + 1})]-\mathbb{E}[L(\theta^*, Z_{T + 1})]\leq2\text{disc}_T(q)+6M\sqrt{\frac{4\pi\log T}{N}}R_{\text{sq}}^T(L\circ\Theta)+\sqrt{\frac{2\log(2 / \sigma)}{N}}+\|q\|_2\left(M\sqrt{8\log\frac{1}{\delta}}+1\right) \] ### Summary This paper reveals the differences in the generalization performance of data augmentation and equivariant networks in the prediction of non - stationary dynamical systems through strict theoretical analysis. In particular, it proves that under symmetry conditions, equivariant networks have better generalization ability; and under approximate symmetry conditions, approximately equivariant networks perform well. This research result provides a theoretical basis for choosing appropriate methods.