A Theoretical Study of The Relationship Between Whole An ELM Network and Its Subnetworks

Enmei Tu,Guanghao Zhang,Lily Rachmawati,Eshan Rajabally,Guang-Bin Huang
DOI: https://doi.org/10.48550/arXiv.1610.09608
2016-10-30
Abstract:A biological neural network is constituted by numerous subnetworks and modules with different functionalities. For an artificial neural network, the relationship between a network and its subnetworks is also important and useful for both theoretical and algorithmic research, i.e. it can be exploited to develop incremental network training algorithm or parallel network training algorithm. In this paper we explore the relationship between an ELM neural network and its subnetworks. To the best of our knowledge, we are the first to prove a theorem that shows an ELM neural network can be scattered into subnetworks and its optimal solution can be constructed recursively by the optimal solutions of these subnetworks. Based on the theorem we also present two algorithms to train a large ELM neural network efficiently: one is a parallel network training algorithm and the other is an incremental network training algorithm. The experimental results demonstrate the usefulness of the theorem and the validity of the developed algorithms.
Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to study and reveal the relationship between the Extreme Learning Machine (ELM) network and its sub - networks. Specifically, the author attempts to solve the following key problems: 1. **Theoretical exploration**: - Investigate whether an ELM network can be decomposed into multiple sub - networks and whether its optimal solution can be recursively constructed from the optimal solutions of these sub - networks. 2. **Algorithm development**: - Based on the above theoretical findings, develop two efficient ELM network training algorithms: 1. **Parallel network training algorithm**: Improve training efficiency by dividing a large - scale ELM network into multiple sub - networks and training them in parallel on multiple computing units. 2. **Incremental network training algorithm**: Build a larger ELM network by gradually adding sub - networks, thereby achieving efficient incremental training. 3. **Practical verification**: - Verify the effectiveness of the proposed theorems and algorithms through experiments to ensure their feasibility and superior performance in practical applications. ### Background and motivation With the advent of the big data era, machine learning methods face challenges in efficiency and effectiveness when dealing with large - scale data. Traditional neural network training methods such as the Back Propagation (BP) network have the problem of low efficiency when dealing with complex patterns. Although ELM has received extensive attention due to its fast training speed and good generalization ability, when dealing with large - scale data, directly calculating the Moore - Penrose inverse matrix still faces problems of memory limitation and high computational complexity. Therefore, studying the relationship between the ELM network and its sub - networks can not only deepen the theoretical understanding of ELM but also provide a theoretical basis for developing more efficient training algorithms, so as to better meet the challenges in big data learning. ### Main contributions 1. **Theoretical proof**: - Prove for the first time that the optimal output weight of the ELM network is a linear transformation of the optimal output weights of its sub - networks, which provides a theoretical basis for subsequent algorithm design. 2. **Algorithm proposal**: - Propose two training algorithms, parallel and incremental, which significantly improve the training efficiency of large - scale ELM networks. 3. **Experimental verification**: - Conduct experiments on four popular handwritten digit classification datasets to verify the effectiveness of the proposed theorems and algorithms. ### Formula representation The key formulas involved in the paper include: - Output of the ELM network: \[ F(x_i)=\sum_{j = 1}^{m}w_jh_j(x_i)=h(x_i)W \] where \(W = [w_1,w_2,\ldots,w_m]^T\in\mathbb{R}^{m\times c}\), \(h(x_i)=[h_1(x_i),h_2(x_i),\ldots,h_m(x_i)]\) is the hidden - layer output of sample \(x_i\). - Optimization objective of output weights: \[ W=\arg\min_{W\in\mathbb{R}^{m\times c}}\|F - Y\|^2+\alpha\|W\|^2 \] The analytical solution is: \[ W=(H^TH + I/\alpha)^{-1}H^TY \] - Relationship in Theorem 1: \[ W = Z\begin{bmatrix}W_1\\W_2\end{bmatrix} \] or the equivalent form: \[ W=\begin{bmatrix}W_1\\W_2\end{bmatrix}-\Delta W \] These formulas are accurately presented in Markdown format, ensuring the correctness and readability of the formulas.