Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

Arnulf Jentzen,Benno Kuckuck,Philippe von Wurstemberger
2023-10-31
Abstract:This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorithms (such as the basic stochastic gradient descent (SGD) method, accelerated methods, and adaptive methods). We also cover several theoretical aspects of deep learning algorithms such as approximation capacities of ANNs (including a calculus for ANNs), optimization theory (including Kurdyka-Łojasiewicz inequalities), and generalization errors. In the last part of the book some deep learning approximation methods for PDEs are reviewed including physics-informed neural networks (PINNs) and deep Galerkin methods. We hope that this book will be useful for students and scientists who do not yet have any background in deep learning at all and would like to gain a solid foundation as well as for practitioners who would like to obtain a firmer mathematical understanding of the objects and methods considered in deep learning.
Machine Learning,Artificial Intelligence,Numerical Analysis,Probability
What problem does this paper attempt to address?
The paper aims to introduce the mathematical foundations, implementations, and theories of deep learning algorithms. It is mainly targeted towards students and scientists without a background in deep learning, as well as practitioners who wish to deepen their mathematical understanding of deep learning methods. The book is divided into six parts: 1. Introduces different types of neural networks, including fully connected feedforward networks, convolutional networks, recurrent networks, and residual networks, providing detailed descriptions of their structures and activation functions. 2. Discusses the operations of fully connected feedforward networks, such as network composition, parallelization, scalar multiplication, and summation. 3. Investigates the approximation ability of neural networks for one-dimensional and multi-dimensional functions, presenting constructive approximation results and convergence rate analysis. 4. Delve into optimization problems, particularly optimization through gradient flow equations and optimization algorithms such as gradient descent and stochastic gradient descent. 5. Analyzes optimization errors, including the optimization process under random initialization. 6. Explores generalization errors, referring to errors that occur when the probability distribution of a learning problem cannot be directly accessed, and provides estimations for both probabilistic and strong generalization errors. Additionally, the book discusses the applications of deep learning in solving partial differential equations, including Physics-Informed Neural Networks (PINNs), Deep Galerkin Methods (DGMs), and Deep Kolmogorov Methods (DKMs). In summary, the paper aims to provide readers with a comprehensive understanding of deep learning algorithms, from theory to practical applications, reinforced through Python source code examples.