RG-Flow: A hierarchical and explainable flow model based on renormalization group and sparse prior

Hong-Ye Hu,Dian Wu,Yi-Zhuang You,Bruno Olshausen,Yubei Chen
DOI: https://doi.org/10.1088/2632-2153/ac8393
2022-08-15
Abstract:Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key ideas of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative model, RG-Flow, which can separate information at different scales of images and extract disentangled representations at each scale. We demonstrate our method on synthetic multi-scale image datasets and the CelebA dataset, showing that the disentangled representations enable semantic manipulation and style mixing of the images at different scales. To visualize the latent representations, we introduce receptive fields for flow-based models and show that the receptive fields of RG-Flow are similar to those of convolutional neural networks. In addition, we replace the widely adopted isotropic Gaussian prior distribution by the sparse Laplacian distribution to further enhance the disentanglement of representations. From a theoretical perspective, our proposed method has $O(\log L)$ complexity for inpainting of an image with edge length $L$, compared to previous generative models with $O(L^2)$ complexity.
Machine Learning,Disordered Systems and Neural Networks,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to effectively separate and represent multi - scale information in image data in the generative model while improving the interpretability of the representation. Specifically: 1. **Multi - scale information separation**: Although traditional flow - based generative models can learn complex distributions, their latent variables are usually globally mixed and cannot well separate information at different scales. RG - Flow designs a hierarchical architecture by introducing the idea of the Renormalization Group (RG), which can gradually extract and separate image features at different scales. 2. **Improving the interpretability of the representation**: In order to make the latent space more interpretable, RG - Flow adopts a sparse prior distribution, especially the Laplacian distribution. Compared with the commonly used Gaussian distribution, the sparse prior breaks the rotational symmetry, making each latent variable more likely to represent specific semantic features, thus improving the interpretability of the representation. 3. **Complexity optimization**: Theoretically, RG - Flow has a complexity of \(O(\log L)\) for the image inpainting task, while previous generative models usually have a complexity of \(O(L^2)\). Here, \(L\) represents the side length of the image. This improvement makes RG - Flow more efficient in processing large - scale images. ### Formula summary - **Negative log - likelihood loss function**: \[ L = -\mathbb{E}_{x \sim p_X(x)} \left( \log p_Z(R(x)) + \log \left| \det \frac{\partial R(x)}{\partial x} \right| \right) \] where \(R \equiv G^{-1}\) is the forward RG transformation and \(R(x) = z\) is the latent variable after the input sample is transformed. - **Sparse prior distribution**: \[ p(z_l) = \frac{1}{2b} \exp\left(-\frac{|z_l|}{b}\right) \] Here, the Laplacian distribution is chosen as the sparse prior to break the rotational symmetry of the latent space and promote the decoupling of the representation. ### Experimental verification The authors verified the effectiveness of RG - Flow through the synthetic multi - scale image datasets MSDS1 and MSDS2. The experimental results show that RG - Flow can successfully capture features at different scales, and by adjusting the latent variables at different levels, the separation and mixing of image content and style can be achieved. In conclusion, by combining the Renormalization Group and sparse prior, RG - Flow not only realizes the effective separation of multi - scale information, but also improves the interpretability and computational efficiency of the generative model.