Abstract:It seems to be a pearl of conventional wisdom that parameter learning in deep sum-product networks is surprisingly fast compared to shallow mixture models. This paper examines the effects of overparameterization in sum-product networks on the speed of parameter optimisation. Using theoretical analysis and empirical experiments, we show that deep sum-product networks exhibit an implicit acceleration compared to their shallow counterpart. In fact, gradient-based optimisation in deep tree-structured sum-product networks is equal to gradient ascend with adaptive and time-varying learning rates and additional momentum terms.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **The influence of overparameterization on the parameter optimization speed in Sum - Product Networks (SPNs)**. Specifically, the authors focus on whether overparameterization can accelerate parameter optimization in deep Sum - Product Networks as it does in other models (such as linear neural networks). Through theoretical analysis and experiments, they prove that the gradient optimization in deep Sum - Product Networks has an implicit acceleration effect, which makes them converge to the optimal solution faster than shallow hybrid models. ### Main research contents: 1. **Background and related work**: - The effect of overparameterization in linear neural networks is introduced, especially that increasing the depth can accelerate optimization. - The basic concepts of Sum - Product Networks (SPNs) and their applications in different tasks are introduced. 2. **The influence of overparameterization in Sum - Product Networks**: - An overparameterized Sum - Product Network structure is proposed, and its gradient update rule is shown. - It is proved that the gradient optimization caused by overparameterization is equivalent to gradient ascent with an adaptive learning rate and a momentum term. - It is analyzed that the tree - structured Sum - Product Networks with natural depth also have a similar acceleration effect. 3. **Experimental results**: - It is proved through experiments that increasing the depth of Sum - Product Networks (i.e., increasing consecutive summation layers) can significantly accelerate the parameter optimization process. - The experimental results show that the Sum - Product Network with three consecutive summation layers achieves a better training log - likelihood value (train LLH) within the same number of iterations, indicating the implicit acceleration effect. ### Conclusion: This paper shows that the optimization dynamics of overparameterized Sum - Product Networks (SPNs) are similar to those of linear neural networks, manifested as gradient optimization with adaptive and time - varying learning rates and momentum terms, thus achieving an implicit acceleration effect. In addition, the tree - structured Sum - Product Networks with natural depth also show the same acceleration effect. This finding is of great significance for understanding the advantages of deep Sum - Product Networks in nonlinear classification tasks. ### Formula summary: - The log - likelihood function of Sum - Product Networks: \[ L(\theta|X)=\sum_{n = 1}^{N}\log f(x_n|\theta)-\log f(*|\theta) \] - The gradient update rule: \[ w_k^{(t + 1)}\approx w_k^{(t)}+\rho(t)\nabla w_k^{(t)}+\lambda(t)w_k^{(t)} \] where: \[ \rho(t):=\eta(w_0^{\phi(k,0)})^2 \] \[ \lambda(t):=\sum_{l = 0}^{L - 1}\eta\nabla w_l^{\phi(k,l)}(w_l^{\phi(k,l)})^{-1} \] These formulas illustrate how overparameterization affects gradient updates and show its implicit acceleration mechanism.

Optimisation of Overparametrized Sum-Product Networks

The Sum-Product Theorem: A Foundation for Learning Tractable Models

Deep Compression of Sum-Product Networks on Tensor Networks

Top-Down Bayesian Posterior Sampling for Sum-Product Networks

Training Overparametrized Neural Networks in Sublinear Time

On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions

Exploiting Problem Structure in Deep Declarative Networks: Two Case Studies

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Sum-Product-Set Networks: Deep Tractable Models for Tree-Structured Graphs

A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks

Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks

Provable convergence of Nesterov’s accelerated gradient method for over-parameterized neural networks

Tractable Probabilistic Graph Representation Learning with Graph-Induced Sum-Product Networks

A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

The Deep Parametric PDE Method: Application to Option Pricing

Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks.

Hyperparameter Optimization with Neural Network Pruning

Generalization and Expressivity for Deep Nets

Sparse Double Descent: Where Network Pruning Aggravates Overfitting