Abstract:Squared tensor networks (TNs) and their generalization as parameterized computational graphs -- squared circuits -- have been recently used as expressive distribution estimators in high dimensions. However, the squaring operation introduces additional complexity when marginalizing variables or computing the partition function, which hinders their usage in machine learning applications. Canonical forms of popular TNs are parameterized via unitary matrices as to simplify the computation of particular marginals, but cannot be mapped to general circuits since these might not correspond to a known TN. Inspired by TN canonical forms, we show how to parameterize squared circuits to ensure they encode already normalized distributions. We then use this parameterization to devise an algorithm to compute any marginal of squared circuits that is more efficient than a previously known one. We conclude by formally showing the proposed parameterization comes with no expressiveness loss for many circuit classes.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the efficiency problem of squared circuits in calculating marginal probabilities and partition functions. Specifically, due to the introduction of the squaring operation, squared circuits have increased complexity when performing variable marginalization or calculating partition functions, which hinders their use in machine - learning applications.
#### Main problems
1. **High computational complexity**: The squaring operation of squared circuits significantly increases the time complexity of calculating marginal probabilities and partition functions. Especially for large - scale data sets, this computational cost becomes unbearable.
2. **Lack of a canonical form**: Traditional tensor networks (TNs) can simplify the probability calculation of specific margins by adopting canonical forms, such as using unitary matrix parameterization. However, these methods cannot be directly applied to squared circuits because squared circuits may not be mapped to known tensor network structures.
#### Solutions
To solve these problems, the author proposes the following methods:
1. **Orthogonal parameterization**: By introducing orthonormal circuits, ensure that the squared circuits encode a already - normalized distribution. Specifically, the input layer encodes orthogonal functions, and the summation layer is parameterized by (semi -) unitary matrices.
- **Definition 3 (Orthogonal circuit)**:
- Each input layer encodes a set of orthogonal functions, that is, \(\int_{\text{dom}(X)} f_i(x) f_j^*(x) dx=\delta_{ij}\), where \(\delta_{ij}\) is the Kronecker delta.
- Each summation layer is parameterized by a (semi -) unitary matrix \(W\in\mathbb{C}^{K_1\times K_2}\) that satisfies \(WW^\dagger = I_{K_1}\) or the rows of \(W\) are orthogonal.
2. **A more efficient marginal calculation algorithm**: Based on the properties of orthonormal circuits, a new algorithm is proposed to calculate any marginal probability with a lower time complexity than existing methods.
- **Theorem 1**: For a structurally decomposed orthonormal circuit \(c\), the time complexity of calculating the marginal likelihood \(p(y)=\int_{\text{dom}(Z)} |c(y, z)|^2 dz\) is \(O(|\phi_Y|S + |\phi_{Y,Z}|S^2)\), where \(\phi_Y\) and \(\phi_{Y,Z}\) represent the sets of layers that depend only on \(Y\) and on both \(Y\) and \(Z\), respectively.
3. **Maintaining expressiveness**: It is proved that orthonormal circuits do not lose expressiveness, that is, a general circuit can be converted into an equivalent orthonormal circuit by a polynomial - time algorithm.
- **Theorem 2**: For a tensored circuit \(c\), if each input layer encodes a set of orthogonal functions, then there exists a polynomial - time algorithm that returns an equivalent orthonormal circuit \(c'\) such that \(c'(X) = Z^{-1}_2 c(X)\), where \(Z=\int_{\text{dom}(X)} |c(x)|^2 dx\).
Through these improvements, the paper provides an effective method to accelerate the marginal calculation of squared circuits and ensures that the expressiveness of the model is not affected. This makes squared circuits more practical in tasks that require fast marginal calculation, such as lossless compression and sampling.