Abstract:This paper challenges the common assumption that the weight β, in β-VAE, should be larger than 1 in order to effectively disentangle latent factors. We demonstrate that β-VAE, with β<1, can not only attain good disentanglement but also significantly improve reconstruction accuracy via dynamic control. The paper \textit{removes the inherent trade-off} between reconstruction accuracy and disentanglement for β-VAE. Existing methods, such as β-VAE and FactorVAE, assign a large weight to the KL-divergence term in the objective function, leading to high reconstruction errors for the sake of better disentanglement. To mitigate this problem, a ControlVAE has recently been developed that dynamically tunes the KL-divergence weight in an attempt to \textit{control the trade-off} to more a favorable point. However, ControlVAE fails to eliminate the conflict between the need for a large β (for disentanglement) and the need for a small β (for smaller reconstruction error). Instead, we propose DynamicVAE that maintains a different β at different stages of training, thereby \textit{decoupling disentanglement and reconstruction accuracy}. In order to evolve the weight, β, along a trajectory that enables such decoupling, DynamicVAE leverages a modified incremental PI (proportional-integral) controller, a variant of proportional-integral-derivative controller (PID) algorithm, and employs a moving average as well as a hybrid annealing method to evolve the value of KL-divergence smoothly in a tightly controlled fashion. We theoretically prove the stability of the proposed approach. Evaluation results on three benchmark datasets demonstrate that DynamicVAE significantly improves the reconstruction accuracy while achieving disentanglement comparable to the best of existing methods. The results verify that our method can separate disentangled representation learning and reconstruction, removing the inherent tension between the two.

Taking a Closer Look at Factor Disentanglement: Dual-Path Variational Autoencoder Learning for Domain Generalization

Facial Landmark Disentangled Network with Variational Autoencoder

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Aggregation of Disentanglement: Reconsidering Domain Variations in Domain Generalization

Disentanglement with Factor Quantized Variational Autoencoders

Guided Variational Autoencoder for Disentanglement Learning

Disentangling Masked Autoencoders for Unsupervised Domain Generalization

Rethinking Controllable Variational Autoencoders

Towards Domain-Specific Features Disentanglement for Domain Generalization

DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning

Improving disentanglement in variational auto-encoders via feature imbalance-informed dimension weighting

Evolving Domain Generalization via Latent Structure-Aware Sequential Autoencoder

Preliminary theoretical troubleshooting in Variational Autoencoder

Disentangling shared and private latent factors in multimodal Variational Autoencoders

Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness

Challenging $\beta$-VAE with $\beta < 1$ for Disentanglement Via Dynamic Learning

DualVAE: Dual Disentangled Variational AutoEncoder for Recommendation

Learning Generalizable Models via Disentangling Spurious and Enhancing Potential Correlations

Unbiased Semantic Representation Learning Based on Causal Disentanglement for Domain Generalization

Variational Disentangled Graph Auto-Encoders for Link Prediction

INSURE: An Information Theory iNspired diSentanglement and pURification modEl for Domain Generalization