Abstract:Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training approach in the vision domain. However, the mechanism and properties of the learned representations by such a scheme, as well as how to further enhance the representations are so far not well-explored. In this paper, we aim to explore an interactive Masked Autoencoders (i-MAE) framework to enhance the representation capability from two aspects: (1) employing a two-way image reconstruction and a latent feature reconstruction with distillation loss to learn better features; (2) proposing a semantics-enhanced sampling strategy to boost the learned semantics in MAE. Upon the proposed i-MAE architecture, we can address two critical questions to explore the behaviors of the learned representations in MAE: (1) Whether the separability of latent representations in Masked Autoencoders is helpful for model performance? We study it by forcing the input as a mixture of two images instead of one. (2) Whether we can enhance the representations in the latent feature space by controlling the degree of semantics during sampling on Masked Autoencoders? To this end, we propose a sampling strategy within a mini-batch based on the semantics of training samples to examine this aspect. Extensive experiments are conducted on CIFAR-10/100, Tiny-ImageNet and ImageNet-1K to verify the observations we discovered. Furthermore, in addition to qualitatively analyzing the characteristics of the latent representations, we examine the existence of linear separability and the degree of semantics in the latent space by proposing two evaluation schemes. The surprising and consistent results demonstrate that i-MAE is a superior framework design for understanding MAE frameworks, as well as achieving better representational ability. Code is available at

An Information Theoretic Approach to the Autoencoder

Information Potential Auto-Encoders.

A New Modal Autoencoder for Functionally Independent Feature Extraction

MAXIMUM ENTROPY AND MINIMAL MUTUAL INFORMATION IN A NONLINEAR MODEL

Knowledge-integrated autoencoder model

Capacity-Approaching Autoencoders for Communications

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization

Binary autoencoder with random binary weights

InfoVAEGAN : learning joint interpretable representations by information maximization and maximum likelihood

Analyzing Multimodal Integration in the Variational Autoencoder from an Information-Theoretic Perspective

TURBO: The Swiss Knife of Auto-Encoders

Information Theoretic-Learning auto-encoder

On a Mechanism Framework of Autoencoders

Supposed Maximum Mutual Information for Improving Generalization and Interpretation of Multi-Layered Neural Networks

Analyzing multimodal probability measures with autoencoders

i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?

Maximal Information Divergence from Statistical Models defined by Neural Networks

Training Invertible Neural Networks as Autoencoders

Variational Graph Autoencoder with Adversarial Mutual Information Learning for Network Representation Learning

Cross-Modal Information Recovery and Enhancement Using Multiple-Input–Multiple-Output Variational Autoencoder