Optimal Deep Neural Networks by Maximization of the Approximation Power

Hector F. Calvo-Pardo,Tullio Mancini,Jose Olmo
DOI: https://doi.org/10.2139/ssrn.3578850
2020-01-01
SSRN Electronic Journal
Abstract:We propose an optimal architecture for deep neural networks. The optimal architecture obtains from maximizing the minimum number of linear regions approximated by a deep neural network with a ReLu activation function. The accuracy of the approximation function relies on the neural network structure, characterized by the number, dependence and hierarchy between the nodes within and across layers. The optimization of the network architecture is performed before bringing the model to the data and improves over cross-validation methods for nonlinear prediction models. Our novel procedure is shown to outperform state-of the-art machine learning models, as empirically illustrated on the Boston Housing dataset. As a byproduct, conditions under which a ReLu deep neural network underperforms relative to a shallow one of similar size are also provided.
What problem does this paper attempt to address?