A Novel Convolutional Neural Network Architecture with a Continuous Symmetry

Yao Liu,Hang Shao,Bing Bai
2024-05-20
Abstract:This paper introduces a new Convolutional Neural Network (ConvNet) architecture inspired by a class of partial differential equations (PDEs) called quasi-linear hyperbolic systems. With comparable performance on the image classification task, it allows for the modification of the weights via a continuous group of symmetry. This is a significant shift from traditional models where the architecture and weights are essentially fixed. We wish to promote the (internal) symmetry as a new desirable property for a neural network, and to draw attention to the PDE perspective in analyzing and interpreting ConvNets in the broader Deep Learning community.
Computer Vision and Pattern Recognition,Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The main problem this paper attempts to address is the limited ability of existing Convolutional Neural Network (ConvNet) architectures to modify weights, especially in a meaningful way without changing the output. Once the traditional ConvNet architecture and weights are determined, they are essentially fixed, with no effective method to modify the model other than making minor adjustments by swapping neurons or units within the same layer. This limits the flexibility and interpretability of the model. To this end, the authors propose a new ConvNet architecture inspired by a class of partial differential equations (PDEs), specifically quasilinear hyperbolic systems. This new architecture allows for weight modifications through continuous symmetry groups without significantly affecting model performance. Specifically, this architecture removes activation functions and introduces a new form of nonlinearity, enabling the model to achieve channel mixing while maintaining performance. This not only provides a new perspective for model design but also potentially facilitates a better understanding of the internal mechanisms of neural networks. The main contributions of the paper include: 1. **Introducing continuous symmetry**: By associating model weights with continuous symmetry groups, the model gains higher flexibility and interpretability. 2. **Removing activation functions**: Most activation functions are removed without significantly reducing performance, simplifying the model structure. 3. **New form of nonlinearity**: A new form of nonlinearity based on quasilinear hyperbolic systems is introduced, providing new ideas for the design of ConvNets. These innovations aim to encourage the deep learning community to re-examine and design ConvNets from the perspective of partial differential equations, thereby developing more efficient and flexible models.