Abstract:Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds significant promise for improving predictions in machine learning. In this work, we aim to understand when and how deep networks can learn symmetries from data. We focus on a supervised classification paradigm where data symmetries are only partially observed during training: some classes include all transformations of a cyclic group, while others include only a subset. We ask: can deep networks generalize symmetry invariance to the partially sampled classes? In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning to address this question. The group-cyclic nature of the dataset allows us to analyze the spectrum of neural kernels in the Fourier domain; here we find a simple characterization of the generalization error as a function of the interaction between class separation (signal) and class-orbit density (noise). We observe that generalization can only be successful when the local structure of the data prevails over its non-local, symmetric, structure, in the kernel space defined by the architecture. This occurs when (1) classes are sufficiently distinct and (2) class orbits are sufficiently dense. Our framework also applies to equivariant architectures (e.g., CNNs), and recovers their success in the special case where the architecture matches the inherent symmetry of the data. Empirically, our theory reproduces the generalization failure of finite-width networks (MLP, CNN, ViT) trained on partially observed versions of rotated-MNIST. We conclude that conventional networks trained with supervision lack a mechanism to learn symmetries that have not been explicitly embedded in their architecture a priori. Our framework could be extended to guide the design of architectures and training procedures able to learn symmetries from data.
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **Can deep neural networks learn symmetries from data during the training process and generalize these symmetries to partially - observed classes?**
Specifically, the author focuses on whether, in supervised classification tasks, when the symmetries of data are only partially observed (for example, some classes contain all cyclic - group transformations, while other classes only contain some transformations), deep networks can successfully generalize symmetric invariance to these partially - sampled classes. This simulates real - world scenarios, such as learning the pose invariance of objects from a limited number of examples.
### Main problem decomposition:
1. **Definition and importance of symmetry**:
- Symmetry means that a transformation through group actions does not change the identity of an object. For example, whether a chair is placed upright or upside down (a transformation in SO(3)), it is still a chair.
- In deep learning, understanding and utilizing the symmetries of data are crucial for effective prediction. For example, can a deep network recognize an object independently of the viewing angle?
2. **Partially - observed symmetries**:
- The paper studies a supervised classification paradigm in which the symmetries of data are partially observed during the training process. For some classes, all possible cyclic - group transformations are observed, while for other classes, only some transformations are observed.
- This setting simulates real - world learning problems, such as a child seeing all possible 3D poses of some objects (such as manipulable toys) during development, while seeing only some poses of many other objects (such as heavy furniture).
3. **Core problem**:
- After being trained on such a partially - observed dataset, can deep networks generalize to partially - sampled classes? That is, can the network correctly extrapolate symmetric invariance to these partially - sampled classes?
- What are the specific conditions that enable the network to successfully generalize symmetric invariance?
### Research methods:
To answer these questions, the author uses the following methods:
- **Neural kernel theory in the infinite - width limit**: In the infinite - width limit, neural networks are equivalent to kernel machines. By applying Gaussian kernel regression to datasets generated by cyclic - group actions, the author analyzes the behavior of the kernel function in the frequency domain.
- **Spectral analysis**: Through spectral analysis of the kernel matrix, the author finds that the generalization behavior can be predicted by a simple ratio of the kernel - frequency power. In particular, successful generalization occurs when the local structure of the data dominates in the kernel space.
- **Experimental verification**: The author trains common network architectures (MLP, CNN, ViT) on a partially - observed version of the rotated MNIST dataset and evaluates their generalization ability. The results show that conventional networks lack a mechanism to learn symmetries not explicitly embedded in their architectures from data under supervised training.
### Conclusions:
The main conclusion of the paper is that traditional deep networks lack a mechanism to learn symmetries not explicitly embedded in their architectures from data under supervised training. Successful generalization depends on the local structure of the data dominating in the kernel space, especially when the classes are sufficiently separated in the kernel space and the symmetric structure is sufficiently local. Future work can design new architectures and training methods based on this framework to better learn symmetries from data.
---
Hope this summary can help you understand the problems that the paper attempts to solve and its main contributions. If you have more questions or need further explanations, please feel free to let me know!