Experimental Exploration on Loss Surface of Deep Neural Network

Qunyong Yuan,Nanfeng Xiao
DOI: https://doi.org/10.1002/ima.22434
IF: 2.177
2020-01-01
International Journal of Imaging Systems and Technology
Abstract:The loss function of the deep neural network is high dimensional, nonconvex and complex. So far, the geometric properties of the loss surface of the neural network have not been well understood. Different from most theoretical studies on the loss surface, this article makes the experimental exploration on the loss surface of the deep neural network, including trajectories of various adaptive optimization algorithms, the Hessian matrix of the loss function of the deep neural network, the curvature of the loss surface along the trajectories of the various adaptive optimization algorithms. It is found that the gradient direction of the adaptive optimization algorithms is almost perpendicular to the direction of the maximum curvature of the loss surface, while the gradient directions of the stochastic gradient descent (SGD) algorithm do not have such a rule. The Hessian matrix of the loss surface along the trajectory of the optimization algorithm is degraded, which is inconsistent with the hypothetical that nonsingular of the Hessian matrix in many theoretical studies of deep learning. Besides, this article proposes a new ensemble learning method of the neural network based on the scaling invariance of the ReLu neural network and mode connectivity.
What problem does this paper attempt to address?