Mathematics of Neural Networks (Lecture Notes Graduate Course)

Bart M. N. Smets
2024-03-06
Abstract:These are the lecture notes that accompanied the course of the same name that I taught at the Eindhoven University of Technology from 2021 to 2023. The course is intended as an introduction to neural networks for mathematics students at the graduate level and aims to make mathematics students interested in further researching neural networks. It consists of two parts: first a general introduction to deep learning that focuses on introducing the field in a formal mathematical way. The second part provides an introduction to the theory of Lie groups and homogeneous spaces and how it can be applied to design neural networks with desirable geometric equivariances. The lecture notes were made to be as self-contained as possible so as to accessible for any student with a moderate mathematics background. The course also included coding tutorials and assignments in the form of a set of Jupyter notebooks that are publicly available at https://gitlab.com/bsmetsjr/mathematics_of_neural_networks.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper mainly discusses the mathematical principles of neural networks, especially the problems and solutions in deep learning. It starts with the basic concepts of supervised learning and explains how to train models through data to accomplish specific tasks. Then, the paper delves into deep neural networks (DNN), including feedforward networks, the problems of gradient vanishing and exploding, high-dimensional data processing, initialization methods (such as random initialization and Xavier initialization), and details of convolutional neural networks (CNN), such as discrete convolution, padding, max pooling, and convolutional layers. In addition, the paper also introduces automatic differentiation and backpropagation algorithm, as well as adaptive learning rate algorithms such as Adagrad, RMSProp, and Adam. In Chapter 3, the paper turns to the concepts of group theory and homomorphic spaces, discussing how to utilize these geometric theories to construct neural networks with structural symmetries such as rotation and translation, namely group convolutional networks. The authors propose concepts such as "uplift layer," "group convolutional layer," and "projection," and briefly introduce tropical operators and semirings. In summary, the problem this paper attempts to address is how to understand the workings of neural networks from a mathematical perspective, especially the challenges encountered in deep learning such as gradient vanishing, parameter initialization, model symmetries, and optimization strategies. By introducing concepts from geometry and group theory, the paper aims to provide a stronger theoretical foundation for neural networks.