Abstract:Autoencoding has achieved great empirical success as a framework for learning generative models for natural images. Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret, and the learned representations lack clear structure. In this work, we make the explicit assumption that the image distribution is generated from a multi-stage sparse deconvolution. The corresponding inverse map, which we use as an encoder, is a multi-stage convolution sparse coding (CSC), with each stage obtained from unrolling an optimization algorithm for solving the corresponding (convexified) sparse coding program. To avoid computational difficulties in minimizing distributional distance between the real and generated images, we utilize the recent closed-loop transcription (CTRL) framework that optimizes the rate reduction of the learned sparse representations. Conceptually, our method has high-level connections to score-matching methods such as diffusion models. Empirically, our framework demonstrates competitive performance on large-scale datasets, such as ImageNet-1K, compared to existing autoencoding and generative methods under fair conditions. Even with simpler networks and fewer computational resources, our method demonstrates high visual quality in regenerated images. More surprisingly, the learned autoencoder performs well on unseen datasets. Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets. Our method is arguably the first to demonstrate that a concatenation of multiple convolution sparse coding/decoding layers leads to an interpretable and effective autoencoder for modeling the distribution of large-scale natural image datasets.

A Theory of Generative ConvNet

Exploring Generative Perspective of Convolutional Neural Networks by Learning Random Field Models

Towards Understanding the Generative Capability of Adversarially Robust Classifiers

Learning Generative ConvNets Via Multi-grid Modeling and Sampling

Generative-Discriminative Complementary Learning

Generative convolution layer for image generation

On Unifying Deep Generative Models

A Predictive-Coding Network That Is Both Discriminative and Generative

Learning Multi-grid Generative ConvNets by Minimal Contrastive Divergence.

Learning FRAME Models Using CNN Filters

Theory of Generative Deep Learning : Probe Landscape of Empirical Error via Norm Based Capacity Control

Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet.

Cooperative Training of Descriptor and Generator Networks

Learning Energy-based Spatial-Temporal Generative ConvNets for Dynamic Patterns

How Well Generative Adversarial Networks Learn Distributions

Revisiting Discriminative Vs. Generative Classifiers: Theory and Implications

Closed-Loop Transcription via Convolutional Sparse Coding

Towards the Gradient Vanishing, Divergence Mismatching and Mode Collapse of Generative Adversarial Nets

Generator Born from Classifier

Learning to Generate Chairs, Tables and Cars with Convolutional Networks

Generative Adversarial Nets