On the Universality of Coupling-based Normalizing Flows

Felix Draxler,Stefan Wahl,Christoph Schnörr,Ullrich Köthe
2024-06-06
Abstract:We present a novel theoretical framework for understanding the expressive power of normalizing flows. Despite their prevalence in scientific applications, a comprehensive understanding of flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. We propose a distributional universality theorem for well-conditioned coupling-based normalizing flows such as RealNVP. In addition, we show that volume-preserving normalizing flows are not universal, what distribution they learn instead, and how to fix their expressivity. Our results support the general wisdom that affine and related couplings are expressive and in general outperform volume-preserving flows, bridging a gap between empirical results and theoretical understanding.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is about the expressive power of normalizing flows, especially the universality problems of volume - preserving flows and coupling - based normalizing flows. Specifically: 1. **Non - universality of volume - preserving flows**: The paper first explores the limitations of volume - preserving flows, that is, these flows cannot generally approximate any target distribution, especially under the Kullback - Leibler (KL) divergence. The author verifies the poor performance of volume - preserving flows in practical applications through theoretical analysis and experiments, and proposes a simple solution to restore their universality, that is, adding a single one - dimensional non - volume - preserving layer. 2. **Universality of coupling - based normalizing flows**: The paper then improves the universality theory of coupling - based normalizing flows in the existing literature. Existing theoretical proofs usually rely on neural networks with poor condition numbers, which are not feasible in practical applications. The author proposes a new proof method, which not only avoids using neural networks with poor condition numbers, but also takes into account the global support of the distribution, thus being more practical both theoretically and practically. ### Main contributions of the paper - **Revealing the non - universality of volume - preserving flows**: The paper proves for the first time that volume - preserving flows are not universal and derives the form of the distribution to which they actually converge. In addition, the author provides a simple repair method to restore their universality by adding a one - dimensional non - volume - preserving layer. - **Improving the universality theory of coupling - based normalizing flows**: The paper proposes a new universality proof, which does not rely on neural networks with poor condition numbers, but achieves universality by training coupling blocks step by step. This proof is constructive, showing that training a sequence of affine coupling blocks can gradually converge to the correct target distribution. - **Verifying empirical observations**: The results of the paper verify the insights previously obtained only through empirical observations, that is, affine coupling blocks are an effective basis for normalizing flows, and the expressive power of volume - preserving flows is limited. ### Formulas and theories - **KL - divergence lower bound of volume - preserving flows**: \[ D_{\text{KL}}(p(x)\|p_{\theta}(x))\geq D_{\text{KL}}(p^*(z)\|\mathcal{N}(0,|\Sigma_{p^*(z)}|^{1/D}I)), \] where \(p^*(z)\) is constructed by decreasingly sorting the input density \(p(x)\) from the origin according to the unit volume change, and \(\Sigma_{p^*(z)}\) is its covariance matrix. - **Definition of loss improvement**: \[ \Delta_{\text{affine}}(p_{\theta}(z)) := D_{\text{KL}}(p_{\theta}(z)\|p(z))-\min_{\theta^+}D_{\text{KL}}(p_{\theta\cup\theta^+}(z)\|p(z)), \] where \(\theta^+=(Q,\phi)\) parameterizes a single \(L\)-bi - Lipschitz affine coupling block, and its conditioner neural network \(\psi_{\phi}\) has at least two hidden layers of finite width and ReLU activation function. - **Properties of loss improvement**: \[ p_{\theta}(z)=\mathcal{N}(z;0,I)\iff\Delta_{\text{affine}}(p_{\theta}(z)) = 0. \] ### Conclusion The paper proves the non - universality of volume - preserving flows through theoretical analysis and experiments, and proposes a simple repair method. At the same time, the paper improves the universality theory of coupling - based normalizing flows, making it more feasible in practical applications. These results provide an important theoretical basis for understanding and optimizing normalizing flows.