On the Generalization Error Bounds of Neural Networks under Diversity-Inducing Mutual Angular Regularization

Pengtao Xie,Yuntian Deng,Eric Xing
DOI: https://doi.org/10.48550/arXiv.1511.07110
2015-11-23
Abstract:Recently diversity-inducing regularization methods for latent variable models (LVMs), which encourage the components in LVMs to be diverse, have been studied to address several issues involved in latent variable modeling: (1) how to capture long-tail patterns underlying data; (2) how to reduce model complexity without sacrificing expressivity; (3) how to improve the interpretability of learned patterns. While the effectiveness of diversity-inducing regularizers such as the mutual angular regularizer has been demonstrated empirically, a rigorous theoretical analysis of them is still missing. In this paper, we aim to bridge this gap and analyze how the mutual angular regularizer (MAR) affects the generalization performance of supervised LVMs. We use neural network (NN) as a model instance to carry out the study and the analysis shows that increasing the diversity of hidden units in NN would reduce estimation error and increase approximation error. In addition to theoretical analysis, we also present empirical study which demonstrates that the MAR can greatly improve the performance of NN and the empirical observations are in accordance with the theoretical analysis.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how the generalization performance of neural networks changes after introducing diversity - inducing regularization (such as mutual - angle regularization). Specifically, the paper focuses on: 1. **Capturing Long - Tail Patterns**: When the pattern popularity distribution in the data follows a power - law distribution, standard latent variable models (LVMs) have difficulty in capturing long - tail patterns that occur at low frequencies. This will lead to information loss. 2. **Trade - off between Model Complexity and Expressiveness**: In order to cope with the rapid growth of pattern complexity in big data, it is usually necessary to increase the scale and capacity of LVMs, but this will bring challenges in training, inference, storage and maintenance. How to reduce model complexity without sacrificing expressiveness is a difficult problem. 3. **Interpretability of Patterns**: There is a large amount of redundancy and overlap in the patterns discovered by existing LVMs from a large amount of data, making these patterns difficult to interpret. To solve these problems, the author introduced mutual - angle regularization (MAR) and analyzed its impact on the generalization performance of LVMs (especially neural networks) in supervised learning. The main research contents include: - **Theoretical Analysis**: By analyzing the influence of mutual - angle regularization on estimation error and approximation error, it is revealed that as the diversity of hidden units increases, the estimation error will decrease while the approximation error will increase. Therefore, choosing an appropriate level of diversity can minimize the overall generalization error. - **Experimental Verification**: It has been proved by experiments that the performance of neural networks with mutual - angle regularization introduced has been significantly improved, and the experimental results are consistent with the theoretical analysis. ### Formula Summary 1. **Definition of Mutual - Angle Regularization (MAR)**: \[ \Omega(A)=\frac{1}{K(K - 1)}\sum_{i = 1}^{K}\sum_{j = 1,j\neq i}^{K}\theta_{ij}-\gamma\frac{1}{K(K - 1)}\sum_{i = 1}^{K}\sum_{j = 1,j\neq i}^{K}\left(\theta_{ij}-\frac{1}{K(K - 1)}\sum_{p = 1}^{K}\sum_{q = 1,q\neq p}^{K}\theta_{pq}\right)^{2} \] where \(\theta_{ij}=\arccos\left(\frac{\vert a_{i}\cdot a_{j}\vert}{\|a_{i}\|\|a_{j}\|}\right)\), \(\gamma>0\) is a trade - off parameter. 2. **Upper Bound of Estimation Error (Single - Layer Neural Network)**: \[ L(\hat{f})-L(f^{*})\leq8(\sqrt{J}+C_{2})\left(2LC_{1}C_{3}C_{4}+C_{4}\vert h(0)\vert\right)\sqrt{\frac{m}{n}}+(\sqrt{J}+C_{2})^{2}\sqrt{\frac{2\log(2 / \delta)}{n}} \] where \[ J = mC_{4}^{2}h^{2}(0)+L^{2}C_{1}^{2}C_{3}^{2}C_{4}^{2}((m - 1)\cos\theta+ 1)+2\sqrt{m}C_{1}C_{3}C_{4}L\vert h(0)\vert\sqrt{(m - 1)\cos\theta+ 1} \] 3. **Upper Bound of Estimation Error of Multi - Layer Neural Network**: \[ L(\hat{f})-L(f^{*})\leq8(\sqrt{J_{p}}+C_{2})\left((2L)^{P}C_{1}C_{3}^{0}\sqrt{n}\prod_{p = 0}^{P - 1}\sqrt{m_{p}C_{3}^{p}}+\vert h(0)\vert\sqrt{n}\sum_{p = 0}^{P - 1}(2L)^{P - 1 - p}\prod_{j = p}^{P - 1} \]