Abstract:This paper challenges the common assumption that the weight β, in β-VAE, should be larger than 1 in order to effectively disentangle latent factors. We demonstrate that β-VAE, with β<1, can not only attain good disentanglement but also significantly improve reconstruction accuracy via dynamic control. The paper \textit{removes the inherent trade-off} between reconstruction accuracy and disentanglement for β-VAE. Existing methods, such as β-VAE and FactorVAE, assign a large weight to the KL-divergence term in the objective function, leading to high reconstruction errors for the sake of better disentanglement. To mitigate this problem, a ControlVAE has recently been developed that dynamically tunes the KL-divergence weight in an attempt to \textit{control the trade-off} to more a favorable point. However, ControlVAE fails to eliminate the conflict between the need for a large β (for disentanglement) and the need for a small β (for smaller reconstruction error). Instead, we propose DynamicVAE that maintains a different β at different stages of training, thereby \textit{decoupling disentanglement and reconstruction accuracy}. In order to evolve the weight, β, along a trajectory that enables such decoupling, DynamicVAE leverages a modified incremental PI (proportional-integral) controller, a variant of proportional-integral-derivative controller (PID) algorithm, and employs a moving average as well as a hybrid annealing method to evolve the value of KL-divergence smoothly in a tightly controlled fashion. We theoretically prove the stability of the proposed approach. Evaluation results on three benchmark datasets demonstrate that DynamicVAE significantly improves the reconstruction accuracy while achieving disentanglement comparable to the best of existing methods. The results verify that our method can separate disentangled representation learning and reconstruction, removing the inherent tension between the two.

Addressing Posterior Collapse by Splitting Decoders in Variational Recurrent Autoencoders

Controlling Posterior Collapse by an Inverse Lipschitz Constraint on the Decoder Network

Neighbor Embedding Variational Autoencoder

On the Encoder-Decoder Incompatibility in Variational Text Modeling and Beyond

A Hierarchical Latent Structure for Variational Conversation Modeling

Variational Autoregressive Decoder for Neural Response Generation.

Flow-Based Variational Sequence Autoencoder

Preventing posterior collapse in variational autoencoders for text generation via decoder regularization

Variational Auto-Decoder: A Method for Neural Generative Modeling from Incomplete Data

VAE-Based Generic Decoding Via Subspace Partition and Priori Utilization.

Beyond Vanilla Variational Autoencoders: Detecting Posterior Collapse in Conditional and Hierarchical Variational Autoencoders

DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning

VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild

Improve variational autoEncoder with auxiliary softmax multiclassifier

How to train your VAE

A Stable Variational Autoencoder for Text Modelling

Generated Loss, Augmented Training, and Multiscale VAE

Improved Variational Neural Machine Translation by Promoting Mutual Information

Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation

Model Order Selection with Variational Autoencoding

Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs