Abstract:Deep learning, despite its remarkable achievements, is still a young field. Like the early stages of many scientific disciplines, it is marked by the discovery of new phenomena, ad-hoc design decisions, and the lack of a uniform and compositional mathematical foundation. From the intricacies of the implementation of backpropagation, through a growing zoo of neural network architectures, to the new and poorly understood phenomena such as double descent, scaling laws or in-context learning, there are few unifying principles in deep learning. This thesis develops a novel mathematical foundation for deep learning based on the language of category theory. We develop a new framework that is a) end-to-end, b) unform, and c) not merely descriptive, but prescriptive, meaning it is amenable to direct implementation in programming languages with sufficient features. We also systematise many existing approaches, placing many existing constructions and concepts from the literature under the same umbrella. In Part I we identify and model two main properties of deep learning systems parametricity and bidirectionality by we expand on the previously defined construction of actegories and Para to study the former, and define weighted optics to study the latter. Combining them yields parametric weighted optics, a categorical model of artificial neural networks, and more. Part II justifies the abstractions from Part I, applying them to model backpropagation, architectures, and supervised learning. We provide a lens-theoretic axiomatisation of differentiation, covering not just smooth spaces, but discrete settings of boolean circuits as well. We survey existing, and develop new categorical models of neural network architectures. We formalise the notion of optimisers and lastly, combine all the existing concepts together, providing a uniform and compositional framework for supervised learning.

The Modern Mathematics of Deep Learning

Mathematical theory of deep learning

A Study of the Mathematics of Deep Learning

Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence

Recent advances in deep learning theory

Deep learning: a statistical viewpoint

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Deep learning methods for partial differential equations and related parameter identification problems

Fundamental Components of Deep Learning: A category-theoretic approach

Towards a Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don't

The many faces of deep learning

Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

Deep Learning in Math Education

Deep Learning and Geometric Deep Learning: an introduction for mathematicians and physicists

Deep Learning: An Introduction for Applied Mathematicians

Mathematical Challenges in Deep Learning

Geometric Deep Learning: Going beyond Euclidean data

Machine Learning and Computational Mathematics

How deep learning works --The geometry of deep learning

A Survey of Deep Learning for Mathematical Reasoning