Measuring the Intrinsic Dimension of Objective Landscapes

Chunyuan Li,Heerad Farkhoor,Rosanne Liu,Jason Yosinski
DOI: https://doi.org/10.48550/arXiv.1804.08838
2018-04-24
Abstract:Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.
Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the minimum number of parameters required by a neural network model when solving a specific task, namely the so - called "intrinsic dimension". Specifically, the author trains the neural network in a randomly generated low - dimensional subspace and gradually increases the dimension of this subspace until the lowest dimension that can solve the problem is found, thereby defining the intrinsic dimension of the problem. This method not only helps to understand the difficulty of different tasks, but also provides a method for quantitatively comparing the difficulty of different types of learning tasks (such as supervised learning, reinforcement learning, etc.). The main contributions of the paper include: 1. **Proposing a new method for measuring the intrinsic dimension of a neural network**: By training the network in a random subspace, gradually increasing the dimension of the subspace, and finding the dimension at which a solution first appears, which is defined as the intrinsic dimension of the problem. 2. **Revealing the relationship between the intrinsic dimension and the number of model parameters**: The study found that for a given dataset, the intrinsic dimension of models of different sizes does not change much, which means that once the parameter space is large enough to solve the problem, the additional parameters mainly increase the dimension of the solution space. 3. **Providing a quantitative comparison of the difficulty of different tasks**: For example, solving the inverted pendulum problem is 100 times easier than classifying MNIST digits, and the difficulty of playing Atari Pong from pixels is comparable to classifying CIFAR - 10 images. 4. **Proposing a simple network compression method**: By training the network in a low - dimensional subspace, the number of model parameters can be significantly reduced, thereby achieving efficient model compression. These findings not only help to understand the optimization process of neural networks, but also provide a new perspective for model design and compression.