Adaptive multiple optimal learning factors for neural network training

Jeshwanth Challagundla
2024-06-05
Abstract:This thesis presents a novel approach to neural network training that addresses the challenge of determining the optimal number of learning factors. The proposed Adaptive Multiple Optimal Learning Factors (AMOLF) algorithm dynamically adjusts the number of learning factors based on the error change per multiply, leading to improved training efficiency and accuracy. The thesis also introduces techniques for grouping weights based on the curvature of the objective function and for compressing large Hessian matrices. Experimental results demonstrate the superior performance of AMOLF compared to existing methods like OWO-MOLF and Levenberg-Marquardt.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the selection and optimization of the number of learning factors in the training process of multi - layer perceptron (MLP). Specifically, the author proposes an Adaptive Multiple Optimal Learning Factors (AMOLF) algorithm, aiming to improve the training efficiency and performance of neural networks by dynamically adjusting the number of learning factors. ### Core Problems of the Paper 1. **Uncertainty of the Number of Learning Factors**: - In traditional neural network training, determining the required number of learning factors has always been a difficult problem. Too many or too few learning factors may lead to poor training results. 2. **Limitations of Existing Algorithms**: - Existing training algorithms such as OWO - MOLF and Levenberg - Marquardt do not perform ideally on some datasets, especially when dealing with large - scale, ill - conditioned problems. ### Solutions To solve the above problems, the author introduces a new method, namely the Adaptive Multiple Optimal Learning Factors algorithm. The main features of this algorithm include: - **Adaptive Adjustment of the Number of Learning Factors**: Dynamically adjust the number of learning factors according to the error change brought by each multiplication operation. - **Group - based Calculation of Learning Factors Based on the Curvature of the Objective Function**: Group the weights according to the curvature of the objective function and calculate the optimal learning factor for each group. - **Linear Compression of the Hessian Matrix**: Linearly compress the large - scale ill - conditioned Newton Hessian matrix into a smaller well - conditioned matrix, thereby reducing the computational complexity. ### Performance Improvement The paper verifies through experiments that the AMOLF algorithm outperforms the OWO - MOLF and Levenberg - Marquardt algorithms on multiple datasets, especially in terms of the error - decreasing speed. ### Formula Representation To understand this algorithm more clearly, the following are several key formulas: - **Error Function**: \[ E=\frac{1}{N}\sum_{p = 1}^{N_v}\sum_{i = 1}^{M}(y_p(i)-t_p(i))^2 \] where \(y_p(i)\) is the actual output, \(t_p(i)\) is the expected output, \(N_v\) is the number of training samples, and \(M\) is the number of output units. - **Optimal Learning Factor (OLF)**: \[ z =-\frac{\left.\frac{\partial E}{\partial z}\right|_{z = 0}}{\left.\frac{\partial^2 E}{\partial z^2}\right|_{z = 0}} \] - **Hessian Matrix Element**: \[ H_{k,j}=\sum_{m = 1}^{M}\sum_{n = 1}^{N}\sum_{p = 1}^{N_v}\frac{\partial y_p(m)}{\partial w(k,n)}\cdot\frac{\partial y_p(m)}{\partial w(j,n)} \] Through these improvements, the AMOLF algorithm can show better performance on different datasets and solve the shortcomings of traditional algorithms in large - scale and complex problems.