Abstract:We present a novel theoretical framework for understanding the expressive power of normalizing flows. Despite their prevalence in scientific applications, a comprehensive understanding of flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. We propose a distributional universality theorem for well-conditioned coupling-based normalizing flows such as RealNVP. In addition, we show that volume-preserving normalizing flows are not universal, what distribution they learn instead, and how to fix their expressivity. Our results support the general wisdom that affine and related couplings are expressive and in general outperform volume-preserving flows, bridging a gap between empirical results and theoretical understanding.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is about the expressive power of normalizing flows, especially the universality problems of volume - preserving flows and coupling - based normalizing flows. Specifically: 1. **Non - universality of volume - preserving flows**: The paper first explores the limitations of volume - preserving flows, that is, these flows cannot generally approximate any target distribution, especially under the Kullback - Leibler (KL) divergence. The author verifies the poor performance of volume - preserving flows in practical applications through theoretical analysis and experiments, and proposes a simple solution to restore their universality, that is, adding a single one - dimensional non - volume - preserving layer. 2. **Universality of coupling - based normalizing flows**: The paper then improves the universality theory of coupling - based normalizing flows in the existing literature. Existing theoretical proofs usually rely on neural networks with poor condition numbers, which are not feasible in practical applications. The author proposes a new proof method, which not only avoids using neural networks with poor condition numbers, but also takes into account the global support of the distribution, thus being more practical both theoretically and practically. ### Main contributions of the paper - **Revealing the non - universality of volume - preserving flows**: The paper proves for the first time that volume - preserving flows are not universal and derives the form of the distribution to which they actually converge. In addition, the author provides a simple repair method to restore their universality by adding a one - dimensional non - volume - preserving layer. - **Improving the universality theory of coupling - based normalizing flows**: The paper proposes a new universality proof, which does not rely on neural networks with poor condition numbers, but achieves universality by training coupling blocks step by step. This proof is constructive, showing that training a sequence of affine coupling blocks can gradually converge to the correct target distribution. - **Verifying empirical observations**: The results of the paper verify the insights previously obtained only through empirical observations, that is, affine coupling blocks are an effective basis for normalizing flows, and the expressive power of volume - preserving flows is limited. ### Formulas and theories - **KL - divergence lower bound of volume - preserving flows**: \[ D_{\text{KL}}(p(x)\|p_{\theta}(x))\geq D_{\text{KL}}(p^*(z)\|\mathcal{N}(0,|\Sigma_{p^*(z)}|^{1/D}I)), \] where \(p^*(z)\) is constructed by decreasingly sorting the input density \(p(x)\) from the origin according to the unit volume change, and \(\Sigma_{p^*(z)}\) is its covariance matrix. - **Definition of loss improvement**: \[ \Delta_{\text{affine}}(p_{\theta}(z)) := D_{\text{KL}}(p_{\theta}(z)\|p(z))-\min_{\theta^+}D_{\text{KL}}(p_{\theta\cup\theta^+}(z)\|p(z)), \] where \(\theta^+=(Q,\phi)\) parameterizes a single \(L\)-bi - Lipschitz affine coupling block, and its conditioner neural network \(\psi_{\phi}\) has at least two hidden layers of finite width and ReLU activation function. - **Properties of loss improvement**: \[ p_{\theta}(z)=\mathcal{N}(z;0,I)\iff\Delta_{\text{affine}}(p_{\theta}(z)) = 0. \] ### Conclusion The paper proves the non - universality of volume - preserving flows through theoretical analysis and experiments, and proposes a simple repair method. At the same time, the paper improves the universality theory of coupling - based normalizing flows, making it more feasible in practical applications. These results provide an important theoretical basis for understanding and optimizing normalizing flows.

On the Universality of Coupling-based Normalizing Flows

Flows for Flows: Training Normalizing Flows Between Arbitrary Distributions with Maximum Likelihood Estimation

The Expressive Power of a Class of Normalizing Flow Models

Piecewise Normalizing Flows

Convergence of Continuous Normalizing Flows for Learning Probability Distributions

Stochastic Normalizing Flows

Kernelised Normalising Flows

Free-form Flows: Make Any Architecture a Normalizing Flow

On the expressivity of bi-Lipschitz normalizing flows

Implicit Normalizing Flows

Normalizing Flows for Domain Adaptation when Identifying $Λ$ Hyperon Events

Normalizing Flow with Variational Latent Representation

Efficient CDF Approximations for Normalizing Flows

Neural Conjugate Flows: Physics-informed architectures with flow structure

Stable Training of Normalizing Flows for High-dimensional Variational Inference

SE(3) Equivariant Augmented Coupling Flows

Semi-Equivariant Conditional Normalizing Flows

Continuous normalizing flows on manifolds

Learning Likelihoods with Conditional Normalizing Flows

Normalizing field flows: Solving forward and inverse stochastic differential equations using physics-informed flow models

normflows: A PyTorch Package for Normalizing Flows