Differentiable homotopy methods for gradually reinforcing the training of fully connected neural networks

Peixuan Li,Yuanbo Li
DOI: https://doi.org/10.1016/j.neucom.2024.128374
IF: 6
2024-08-18
Neurocomputing
Abstract:Deep fully connected neural networks (FCNNs) are the workhorses of deep learning and are broadly applicable due to their "agnostic" structure. Generally, the learning capability of FCNNs improves with the increase in the number of layers and the width of each layer, which, however, comes at an increased computational cost in training. To alleviate this difficulty, in this paper, we develop a gradually reinforced differentiable homotopy (GRDH) method to train the FCNNs. Explicitly speaking, by introducing an extra variable t ranging between zero and one, we design a series of auxiliary functions, which are continuous and monotonically increasing in t . With the above functions, we formulate an optimization problem for training an artificial FCNN, which progressively incorporates more layers and nodes into the neural network as t changes from one to zero and eventually becomes an FCNN with a target number of layers or width of nodes. We prove that the set of solutions to the artificial problem contains an everywhere differentiable path, which starts from a uniquely given point at t=1 and ends at the weights and biases of the target FCNN as t goes to zero. The proposed GRDH method is a novel method that incorporates the differentiable homotopy methods into the training of deep learning methods, and retains the satisfactory theoretical convergence property the classical homotopy methods possess. To promote the application of the GRDH method, we implement it and another efficient method called HTA to train the same FCNNs and find that the GRDH method outperforms the HTA both in the computational time and number of iterations for obtaining a solution with similar (even higher) accuracy. Numerical results further confirm the effectiveness of the GRDH method to solve classification problems.
computer science, artificial intelligence
What problem does this paper attempt to address?