Abstract:Representation learning is essential for machine learning technique nowadays. The transition of input representations have been developing intensively in algorithm performance benefited from the growth of hand-crafted features to the representation for multi-media data. However, the representations of visual data are often highly entangled. The interpretation challenges are to be faced because all information components are encoded into the same feature space. Disentangled representation learning（DRL） aims to learn a low-dimensional interpretable abstract representation that can sort the multiple factors of variation out in high-dimensional observations. In the disentangled representation, we can capture and manipulate the information of a single factor of variation through the corresponding latent subspace, which makes it more interpretable. DRL can improve sample efficiency and tolerance to the nuisance variables and offer robust representation of complex variations. Their semantic information is extracted and beneficial for artificial intelligence（AI） downstream tasks like recognition, classification and domain adaptation. Our summary is focused on brief introduction to the definition, research development and applications of DRL. Some of independent component analysis（ICA）-nonlinear DRL researches are covered as well since the DRL is similar to the identifiability issue of nonlinear independent component analysis（nonlinear ICA）. The cause and effects mechanism of DRL as high-dimensional ground truth data is generated by a set of unobserved changing factors（generating factors）. The DRL can be used to model the factors of variation in terms of latent representation, and the observed data generation process is restored. We summarize the key elements that a well-defined disentangled representation should be qualified into three aspects, which are 1) modularity, 2) compactness, and 3) explicitness. First, explicitness consists of the two sub-requirements of completeness and informativeness. Then, current DRL types are categorized into 1) dimension-wise disentanglement, 2) semantic-based disentanglement, 3) hierarchical disentanglement, and 4) nonlinear ICA four types in terms of its formulation, characteristics, and scope of application.Dimension-wise disentanglement is assumed that the generative factors are solely and each dimension of latent vector can be separated and mapped, which is suitable for learning the disentangled representation of simple synthetic visual data.Semantic-based disentanglement is hypnotized that some semantic information is solely as well. The generative factors are group-disentangled in terms of specific semantics and they are mapped to different latent spaces, which is suitable for complicated ground truth data. Hierarchical disentanglement is based on the assumption that there is a correlation between generative factors at different levels of abstraction. The generative factors are disentangled by group from the bottom up and they can be mapped to latent space of different semantic abstraction levels to form a hierarchical disentangled representation. Nonlinear ICA provides an identifiable method for observed data-mixed disentangling unknown generative factors through a nonlinear reversible generator. For the motivation of loss functions, the loss functions can be commonly used in disentangled representation learning, which are grouped into three categories: 1) modularity constraint: a single latent variable-constrained in the disentangled representation to capture only a single or a single group of factors of variation, and it promotes the separation of factors of variation mutually; 2) explicitness constraint: current latent variable of the latent representation is activated to encode the ground truth of the corresponding generating factor effectively, and the entire latent representation contains complete information about all generative factors; and 3) multi-purpose constraint: lossrelated can optimize multiple disentangled representation, including modularity, compactness, and explicitness of the disentangled representation at the same time. The model-relevant can combine multiple loss constraint terms to form the final hybrid objective function. We compare the scope of application and limitations of each type of loss functions and summarize the classical disentangled representation works using the hybrid objective function further.

Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

Improving the Reconstruction of Disentangled Representation Learners Via Multi-Stage Modelling.

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Disentangled Representations in Neural Models

Learning Disentangled Representation with Pairwise Independence

Bridging Disentanglement with Independence and Conditional Independence Via Mutual Information for Representation Learning.

APGVAE: Adaptive Disentangled Representation Learning with the Graph-Based Structure Information

Bridging Disentanglement with Independence and Conditional Independence via Mutual Information for Representation Learning

Multifactor Sequential Disentanglement via Structured Koopman Autoencoders

Learning Network Representations with Disentangled Graph Auto-Encoder

DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images

DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning

Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models

Learning Disentangled Representation by Exploiting Pretrained Generative Models: A Contrastive Learning View

Disentangling Multi-view Representations Beyond Inductive Bias

Learning Disentangled Representations via Independent Subspaces

Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models

Independence Constrained Disentangled Representation Learning from Epistemological Perspective

Towards Better Understanding of Disentangled Representations via Mutual Information

Reconstruction of Hidden Representation for Robust Feature Extraction

A review of disentangled representation learning for visual data processing and analysis