On the Hardness of Training Deep Neural Networks Discretely

Ilan Doron-Arad
2024-12-18
Abstract:We study neural network training (NNT): optimizing a neural network's parameters to minimize the training loss over a given dataset. NNT has been studied extensively under theoretic lenses, mainly on two-layer networks with linear or ReLU activation functions where the parameters can take any real value (here referred to as continuous NNT (C-NNT)). However, less is known about deeper neural networks, which exhibit substantially stronger capabilities in practice. In addition, the complexity of the discrete variant of the problem (D-NNT in short), in which the parameters are taken from a given finite set of options, has remained less explored despite its theoretical and practical significance. In this work, we show that the hardness of NNT is dramatically affected by the network depth. Specifically, we show that, under standard complexity assumptions, D-NNT is not in the complexity class NP even for instances with fixed dimensions and dataset size, having a deep architecture. This separates D-NNT from any NP-complete problem. Furthermore, using a polynomial reduction we show that the above result also holds for C-NNT, albeit with more structured instances. We complement these results with a comprehensive list of NP-hardness lower bounds for D-NNT on two-layer networks, showing that fixing the number of dimensions, the dataset size, or the number of neurons in the hidden layer leaves the problem challenging. Finally, we obtain a pseudo-polynomial algorithm for D-NNT on a two-layer network with a fixed dataset size.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is about the complexity of deep neural network (DNN) training, especially the training difficulty in the discrete parameter space. Specifically: 1. **Difficulty in training deep neural networks**: The author studied neural network training (NNT), that is, optimizing the parameters of the neural network to minimize the training loss on a given data set. Although the training complexity of two - layer neural networks in the continuous parameter space (C - NNT) has been widely studied, little is known about the training complexity of deeper neural networks, especially in the discrete parameter space (D - NNT). 2. **Differences between discrete and continuous parameter spaces**: The paper explored the complexity differences between the discrete parameter space (D - NNT) and the continuous parameter space (C - NNT). The discrete parameter space means that the network parameters are taken from a finite set, while the continuous parameter space allows the parameters to take any real value. 3. **The impact of depth on training difficulty**: The author proved that the network depth significantly affects the training difficulty. Specifically, even in the case of a deep network architecture, discrete neural network training (D - NNT) is not in the NP complexity class, which means it is more difficult than any NP - complete problem. 4. **Theoretical lower bounds**: The paper provided multiple NP - hardness lower bounds for D - NNT and showed that the problem remains challenging when the fixed dimension, data set size, or the number of hidden - layer neurons. 5. **Pseudo - polynomial algorithm**: The author also proposed a pseudo - polynomial - time algorithm for two - layer networks in the case of a fixed data set size. ### Main contributions - **Complexity of training deep networks**: Proved that even in the case of a deep network architecture, discrete neural network training (D - NNT) is not in the NP complexity class. - **Comparison between discrete and continuous parameter spaces**: Revealed that the training problem in the discrete parameter space is more difficult than the problem in the continuous parameter space in some cases. - **Theoretical lower bounds**: Provided multiple NP - hardness lower bounds for D - NNT, especially the results on two - layer networks. - **Pseudo - polynomial algorithm**: Proposed a pseudo - polynomial - time algorithm for two - layer networks. ### Formula representation All the formulas involved in the paper are accurately presented in Markdown format, for example: - Application of activation function: \[ z_v(x)=\sigma_v\left(\sum_{e = (u, v)\in E}w_e\cdot z_u(x)+b_e\right) \] - Training objective: \[ \sum_{i = 1}^nL(f_\theta(x_i),y_i)\leq\gamma \] ### Summary This paper aims to deeply study the complexity of deep neural network training, especially the training difficulty in the discrete parameter space, and provides theoretical proofs and algorithm designs, revealing the important impact of depth on training difficulty.