Abstract:How can agents learn internal models that veridically represent interactions with the real world is a largely open question. As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study this problem using tools from representation learning and group theory. We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it. We use an autoencoder equipped with a group representation acting on its latent space, trained using an equivariance-derived loss in order to enforce a suitable homomorphism property on the group representation. In contrast to existing work, our approach does not require prior knowledge of the group and does not restrict the set of actions the agent can perform. We motivate our method theoretically, and show empirically that it can learn a group representation of the actions, thereby capturing the structure of the set of transformations applied to the environment. We further show that this allows agents to predict the effect of sequences of future actions with improved accuracy.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper explores how to enable agents to learn internal models that can truly reflect interactions with the real world. Specifically, the authors attempt to solve the following problems: 1. **Learning of internal models**: - How can agents learn internal representations that contain not only observational information but also intervention knowledge? - These internal models should be able to accurately predict the effects of future actions and capture the structure of environmental changes. 2. **Relationship between actions and perception**: - How can agents learn the relationship between perceptual information and actions through interaction with the environment? - The authors propose a method that enables agents to learn a representation in a latent space that is consistent with the actions they perform. 3. **Learning of group structure**: - How can the group structure of actions be learned from data without prior knowledge of the specific form of the group or restricting the set of actions that agents can perform? - The authors use the Homomorphism Autoencoder (HAE) to achieve this. This autoencoder can learn the group representation of actions in the latent space. 4. **Disentangled representation**: - How can it be ensured that the learned representation is disentangled, that is, the representation can be decomposed into multiple sub - spaces, each corresponding to different attributes in the environment that can be independently modified? ### Method overview To achieve the above goals, the authors propose the following methods: - **Homomorphism Autoencoder (HAE)**: This is a special autoencoder that can learn the group representation of actions in the latent space. By introducing a loss function based on equivariance, HAE can learn an appropriate homomorphism property. - **Group representation**: The authors assume that the actions of agents form a group or a subset of it, and these actions are composable. By using matrices to represent these transformations in the latent space, they can efficiently encode and combine these transformations. - **Disentangled representation**: By introducing a sparse regularization term, the authors encourage the learned representation to have a block - diagonal structure, thereby promoting the learning of disentangled representation. ### Experimental verification The authors verified their method through a series of experiments, including action learning on a two - dimensional torus and the learning of the more complex three - dimensional rotation group (SO(2)×SO(2)×SO(2)). The experimental results show that HAE can effectively learn the group structure of actions and perform well in predicting the effects of future actions. ### Summary In general, this paper attempts to solve the problem of how agents learn internal models through interaction with the environment, especially how to learn representations that contain both observational information and intervention knowledge. By introducing the homomorphism autoencoder and group representation theory, the authors provide a new method to solve this problem and demonstrate its effectiveness in multiple tasks.

Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions

Auto-Encoding Transformations in Reparameterized Lie Groups for Unsupervised Learning.

Analyzing multimodal probability measures with autoencoders

Group-based Learning of Disentangled Representations with Generalizability for Novel Contents

HomE: Homography-Equivariant Video Representation Learning

Unsupervised Learning of Group Invariant and Equivariant Representations

Homomorphism Counts as Structural Encodings for Graph Learning

Learning Action Representations for Reinforcement Learning

Multi-Level Variational Autoencoder: Learning Disentangled Representations From Grouped Observations

Reconstruction of Fragmented Trajectories of Collective Motion using Hadamard Deep Autoencoders

Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations

Graph-Convolutional Autoencoder Ensembles for the Humanities, Illustrated with a Study of the American Slave Trade

Variational Offline Multi-agent Skill Discovery

Neural Isometries: Taming Transformations for Equivariant ML

Holographic Neural Architectures

Semantic-Aware Auto-Encoders for Self-supervised Representation Learning

Variational Autoencoders for Opponent Modeling in Multi-Agent Systems

Deep multivariate autoencoder for capturing complexity in Brain Structure and Behaviour Relationships

Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space

Efficient Representations for Life-Long Learning and Autoencoding

Autoencoding topology