Optimisation of Overparametrized Sum-Product Networks

Martin Trapp,Robert Peharz,Franz Pernkopf
DOI: https://doi.org/10.48550/arXiv.1905.08196
2019-05-29
Abstract:It seems to be a pearl of conventional wisdom that parameter learning in deep sum-product networks is surprisingly fast compared to shallow mixture models. This paper examines the effects of overparameterization in sum-product networks on the speed of parameter optimisation. Using theoretical analysis and empirical experiments, we show that deep sum-product networks exhibit an implicit acceleration compared to their shallow counterpart. In fact, gradient-based optimisation in deep tree-structured sum-product networks is equal to gradient ascend with adaptive and time-varying learning rates and additional momentum terms.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **The influence of overparameterization on the parameter optimization speed in Sum - Product Networks (SPNs)**. Specifically, the authors focus on whether overparameterization can accelerate parameter optimization in deep Sum - Product Networks as it does in other models (such as linear neural networks). Through theoretical analysis and experiments, they prove that the gradient optimization in deep Sum - Product Networks has an implicit acceleration effect, which makes them converge to the optimal solution faster than shallow hybrid models. ### Main research contents: 1. **Background and related work**: - The effect of overparameterization in linear neural networks is introduced, especially that increasing the depth can accelerate optimization. - The basic concepts of Sum - Product Networks (SPNs) and their applications in different tasks are introduced. 2. **The influence of overparameterization in Sum - Product Networks**: - An overparameterized Sum - Product Network structure is proposed, and its gradient update rule is shown. - It is proved that the gradient optimization caused by overparameterization is equivalent to gradient ascent with an adaptive learning rate and a momentum term. - It is analyzed that the tree - structured Sum - Product Networks with natural depth also have a similar acceleration effect. 3. **Experimental results**: - It is proved through experiments that increasing the depth of Sum - Product Networks (i.e., increasing consecutive summation layers) can significantly accelerate the parameter optimization process. - The experimental results show that the Sum - Product Network with three consecutive summation layers achieves a better training log - likelihood value (train LLH) within the same number of iterations, indicating the implicit acceleration effect. ### Conclusion: This paper shows that the optimization dynamics of overparameterized Sum - Product Networks (SPNs) are similar to those of linear neural networks, manifested as gradient optimization with adaptive and time - varying learning rates and momentum terms, thus achieving an implicit acceleration effect. In addition, the tree - structured Sum - Product Networks with natural depth also show the same acceleration effect. This finding is of great significance for understanding the advantages of deep Sum - Product Networks in nonlinear classification tasks. ### Formula summary: - The log - likelihood function of Sum - Product Networks: \[ L(\theta|X)=\sum_{n = 1}^{N}\log f(x_n|\theta)-\log f(*|\theta) \] - The gradient update rule: \[ w_k^{(t + 1)}\approx w_k^{(t)}+\rho(t)\nabla w_k^{(t)}+\lambda(t)w_k^{(t)} \] where: \[ \rho(t):=\eta(w_0^{\phi(k,0)})^2 \] \[ \lambda(t):=\sum_{l = 0}^{L - 1}\eta\nabla w_l^{\phi(k,l)}(w_l^{\phi(k,l)})^{-1} \] These formulas illustrate how overparameterization affects gradient updates and show its implicit acceleration mechanism.