Abstract:Disentangled representation learning aims to represent the underlying generative factors of a dataset in a latent representation independently of one another. In our work, we propose a discrete variational autoencoder (VAE) based model where the ground truth information about the generative factors are not provided to the model. We demonstrate the advantages of learning discrete representations over learning continuous representations in facilitating disentanglement. Furthermore, we propose incorporating an inductive bias into the model to further enhance disentanglement. Precisely, we propose scalar quantization of the latent variables in a latent representation with scalar values from a global codebook, and we add a total correlation term to the optimization as an inductive bias. Our method called FactorQVAE is the first method that combines optimization based disentanglement approaches with discrete representation learning, and it outperforms the former disentanglement methods in terms of two disentanglement metrics (DCI and InfoMEC) while improving the reconstruction performance. Our code can be found at \url{<a class="link-external link-https" href="https://github.com/ituvisionlab/FactorQVAE" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key challenges in **Disentangled Representation Learning**. Specifically, the authors propose a new method - **FactorQVAE** - to achieve better disentangled representations, especially in the absence of ground - truth information of the generative factors. The following are the main problems that the paper attempts to solve: 1. **Challenges of disentangled representation**: - The goal of disentangled representation learning is to independently represent the latent generative factors of data in a low - dimensional latent space. However, since most datasets lack knowledge about the true generative factors, it becomes very difficult to learn disentangled representations. - Unsupervised disentangled representation learning has been proven to be challenging, and even impossible in some cases, unless certain restrictive assumptions or inductive biases are introduced. 2. **Discrete representation vs. continuous representation**: - The paper explores the advantages of using discrete representations over continuous ones. Discrete representations can better capture categorical information, and through quantization operations, can force the model to assign a constant meaning to each value, thus promoting disentanglement. 3. **Global codebook vs. per - dimension codebook**: - Traditional discrete VAE methods (such as QLAE) use a codebook for each latent variable, which may limit the representational ability. The paper proposes using a single global codebook and further enhancing disentanglement through regularization optimization, instead of designing a separate codebook for each dimension. 4. **Introducing the total correlation term**: - To further enhance the disentanglement effect, the paper introduces a "total correlation term" as an inductive bias in the optimization process. This term promotes disentanglement by encouraging the factorization of the marginal posterior distribution. ### Method overview The paper proposes **FactorQVAE**, which is a variational auto - encoder (VAE) that combines discrete representation learning and factorization. Specific improvements include: - **Discrete representation learning**: Use a single global codebook for scalar quantization instead of vector quantization. - **Factorization**: Introduce the total correlation term into the optimization objective to promote the independence between latent variables. - **Training framework**: Redesign the training frameworks of two discrete VAE models (VQ - VAE and dVAE) to enhance the disentanglement performance. Through these improvements, FactorQVAE outperforms existing methods in disentanglement metrics (DCI and InfoMEC) and also has an improvement in reconstruction performance. ### Summary The paper solves multiple challenges in unsupervised disentangled representation learning by introducing discrete representation learning and factorization techniques, and demonstrates the superior performance of its method on multiple datasets.

Disentanglement with Factor Quantized Variational Autoencoders

Disentanglement via Latent Quantization

Improving disentanglement in variational auto-encoders via feature imbalance-informed dimension weighting

Rethinking Controllable Variational Autoencoders

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning

Guided Variational Autoencoder for Disentanglement Learning

Challenging $\beta$-VAE with $\beta < 1$ for Disentanglement Via Dynamic Learning

Facial Landmark Disentangled Network with Variational Autoencoder

Bridging Disentanglement with Independence and Conditional Independence via Mutual Information for Representation Learning

Disentangling Factors of Variation by Mixing Them

Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders

Bridging Disentanglement with Independence and Conditional Independence Via Mutual Information for Representation Learning.

Variantional autoencoder with decremental information bottleneck for disentanglement

Disentangling Generative Factors of Physical Fields Using Variational Autoencoders

Sequential Disentanglement by Extracting Static Information From A Single Sequence Element

mcVAE: disentangling by mean constraint

Disentanglement of Latent Representations via Causal Interventions

Multifactor Sequential Disentanglement via Structured Koopman Autoencoders

Learning Disentangled Discrete Representations

Disentangled VAE Representations for Multi-Aspect and Missing Data