Abstract:Representation learning is essential for machine learning technique nowadays. The transition of input representations have been developing intensively in algorithm performance benefited from the growth of hand-crafted features to the representation for multi-media data. However, the representations of visual data are often highly entangled. The interpretation challenges are to be faced because all information components are encoded into the same feature space. Disentangled representation learning（DRL） aims to learn a low-dimensional interpretable abstract representation that can sort the multiple factors of variation out in high-dimensional observations. In the disentangled representation, we can capture and manipulate the information of a single factor of variation through the corresponding latent subspace, which makes it more interpretable. DRL can improve sample efficiency and tolerance to the nuisance variables and offer robust representation of complex variations. Their semantic information is extracted and beneficial for artificial intelligence（AI） downstream tasks like recognition, classification and domain adaptation. Our summary is focused on brief introduction to the definition, research development and applications of DRL. Some of independent component analysis（ICA）-nonlinear DRL researches are covered as well since the DRL is similar to the identifiability issue of nonlinear independent component analysis（nonlinear ICA）. The cause and effects mechanism of DRL as high-dimensional ground truth data is generated by a set of unobserved changing factors（generating factors）. The DRL can be used to model the factors of variation in terms of latent representation, and the observed data generation process is restored. We summarize the key elements that a well-defined disentangled representation should be qualified into three aspects, which are 1) modularity, 2) compactness, and 3) explicitness. First, explicitness consists of the two sub-requirements of completeness and informativeness. Then, current DRL types are categorized into 1) dimension-wise disentanglement, 2) semantic-based disentanglement, 3) hierarchical disentanglement, and 4) nonlinear ICA four types in terms of its formulation, characteristics, and scope of application.Dimension-wise disentanglement is assumed that the generative factors are solely and each dimension of latent vector can be separated and mapped, which is suitable for learning the disentangled representation of simple synthetic visual data.Semantic-based disentanglement is hypnotized that some semantic information is solely as well. The generative factors are group-disentangled in terms of specific semantics and they are mapped to different latent spaces, which is suitable for complicated ground truth data. Hierarchical disentanglement is based on the assumption that there is a correlation between generative factors at different levels of abstraction. The generative factors are disentangled by group from the bottom up and they can be mapped to latent space of different semantic abstraction levels to form a hierarchical disentangled representation. Nonlinear ICA provides an identifiable method for observed data-mixed disentangling unknown generative factors through a nonlinear reversible generator. For the motivation of loss functions, the loss functions can be commonly used in disentangled representation learning, which are grouped into three categories: 1) modularity constraint: a single latent variable-constrained in the disentangled representation to capture only a single or a single group of factors of variation, and it promotes the separation of factors of variation mutually; 2) explicitness constraint: current latent variable of the latent representation is activated to encode the ground truth of the corresponding generating factor effectively, and the entire latent representation contains complete information about all generative factors; and 3) multi-purpose constraint: lossrelated can optimize multiple disentangled representation, including modularity, compactness, and explicitness of the disentangled representation at the same time. The model-relevant can combine multiple loss constraint terms to form the final hybrid objective function. We compare the scope of application and limitations of each type of loss functions and summarize the classical disentangled representation works using the hybrid objective function further.

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

On the Necessity of Disentangled Representations for Downstream Tasks

A review of disentangled representation learning for visual data processing and analysis

Where and What? Examining Interpretable Disentangled Representations

Disentangled Representations in Neural Models

DAReN: A Collaborative Approach Towards Reasoning And Disentangling

Bridging Disentanglement with Independence and Conditional Independence Via Mutual Information for Representation Learning.

Bridging Disentanglement with Independence and Conditional Independence via Mutual Information for Representation Learning

HSDN: A High-Order Structural Semantic Disentangled Neural Network

A Review of Disentangled Representation Learning for Remote Sensing Data

Learning Disentangled Representation with Pairwise Independence

Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models

Towards Better Understanding of Disentangled Representations via Mutual Information

Independence Constrained Disentangled Representation Learning from Epistemological Perspective

Vector-based Representation is the Key: A Study on Disentanglement and Compositional Generalization

Retrieval-based Disentangled Representation Learning with Natural Language Supervision

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Towards Building A Group-based Unsupervised Representation Disentanglement Framework

Disentangling Representations through Multi-task Learning

Defining and Measuring Disentanglement for non-Independent Factors of Variation

Transferring disentangled representations: bridging the gap between synthetic and real images