Bart M. N. Smets
Abstract:These are the lecture notes that accompanied the course of the same name that
I taught at the Eindhoven University of Technology from 2021 to 2023. The
course is intended as an introduction to neural networks for mathematics
students at the graduate level and aims to make mathematics students interested
in further researching neural networks. It consists of two parts: first a
general introduction to deep learning that focuses on introducing the field in
a formal mathematical way. The second part provides an introduction to the
theory of Lie groups and homogeneous spaces and how it can be applied to design
neural networks with desirable geometric equivariances. The lecture notes were
made to be as self-contained as possible so as to accessible for any student
with a moderate mathematics background. The course also included coding
tutorials and assignments in the form of a set of Jupyter notebooks that are
publicly available at
https://gitlab.com/bsmetsjr/mathematics_of_neural_networks.
What problem does this paper attempt to address?
This paper mainly discusses the mathematical principles of neural networks, especially the problems and solutions in deep learning. It starts with the basic concepts of supervised learning and explains how to train models through data to accomplish specific tasks. Then, the paper delves into deep neural networks (DNN), including feedforward networks, the problems of gradient vanishing and exploding, high-dimensional data processing, initialization methods (such as random initialization and Xavier initialization), and details of convolutional neural networks (CNN), such as discrete convolution, padding, max pooling, and convolutional layers. In addition, the paper also introduces automatic differentiation and backpropagation algorithm, as well as adaptive learning rate algorithms such as Adagrad, RMSProp, and Adam.
In Chapter 3, the paper turns to the concepts of group theory and homomorphic spaces, discussing how to utilize these geometric theories to construct neural networks with structural symmetries such as rotation and translation, namely group convolutional networks. The authors propose concepts such as "uplift layer," "group convolutional layer," and "projection," and briefly introduce tropical operators and semirings.
In summary, the problem this paper attempts to address is how to understand the workings of neural networks from a mathematical perspective, especially the challenges encountered in deep learning such as gradient vanishing, parameter initialization, model symmetries, and optimization strategies. By introducing concepts from geometry and group theory, the paper aims to provide a stronger theoretical foundation for neural networks.