Abstract:Deep learning models are increasingly employed for perception, prediction, and control in complex systems. Embedding physical knowledge into these models is crucial for achieving realistic and consistent outputs, a challenge often addressed by physics-informed machine learning. However, integrating physical knowledge with representation learning becomes difficult when dealing with high-dimensional observation data, such as images, particularly under conditions of incomplete or imprecise state information. To address this, we propose Physically Interpretable World Models, a novel architecture that aligns learned latent representations with real-world physical quantities. Our method combines a variational autoencoder with a dynamical model that incorporates unknown system parameters, enabling the discovery of physically meaningful representations. By employing weak supervision with interval-based constraints, our approach eliminates the reliance on ground-truth physical annotations. Experimental results demonstrate that our method improves the quality of learned representations while achieving accurate predictions of future states, advancing the field of representation learning in dynamic systems.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: How to learn physically meaningful representations in high - dimensional observational data (such as images) and use these representations to accurately predict future states. Specifically, the paper aims to overcome the following challenges: 1. **High - dimensional observational data**: In real - world applications, the internal physical state is often not fully accessible, and only the key properties of the system (such as position or safety) can be indirectly known through high - dimensional observational data (such as images and Lidar scans). This makes physics - knowledge - based learning difficult. 2. **Incomplete or inaccurate state information**: In many cases, the internal state of the system may be partially observable or completely hidden, making it difficult for traditional methods to be directly applied to these scenarios. 3. **Weak supervision signals**: In order to enable the model to learn meaningful representations from a small number of or inaccurate physical labels, the paper proposes a method of using interval constraints as weak supervision signals, avoiding the dependence on accurate physical labels. To solve these problems, the authors propose "Physically Interpretable World Models" (PIWMs), a new architecture that combines variational auto - encoders (VAE) and dynamic models. This method aligns the learned latent representations with actual physical quantities through weakly - supervised learning, thereby improving the physical interpretability and prediction accuracy of the representations. ### Main contributions of the paper include: 1. **Novel learning architecture**: A new architecture that can encode physically meaningful representations from high - dimensional observational data has been designed. 2. **Effective training pipeline**: Weak supervision signals have been introduced to guide representation learning and adapt to unknown dynamic systems. 3. **Experimental verification**: Through two case studies (inverted pendulum and lunar lander), the superior performance of this method in terms of physical interpretability and prediction accuracy has been demonstrated. ### Specific questions: - **How to effectively learn physically meaningful representations from weak supervision?** - **To what extent can these learned representations predict future physical states?** By solving these problems, the paper provides new ideas and methods for state prediction and decision - making in complex dynamic systems.

Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction

How Deep Neural Networks Understand Motion? Toward Interpretable Motion Modeling by Leveraging the Relative Change in Position

Extracting Interpretable Physical Parameters from Spatiotemporal Systems using Unsupervised Learning

Learning Physical Dynamics for Object-centric Visual Prediction

Interpretable Representation Learning from Videos using Nonlinear Priors

Towards an Interpretable Latent Space in Structured Models for Video Prediction

Identifying Terrain Physical Parameters from Vision -- Towards Physical-Parameter-Aware Locomotion and Navigation

3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes

MUVO: A Multimodal World Model with Spatial Representations for Autonomous Driving

Kinematics-aware Trajectory Generation and Prediction with Latent Stochastic Differential Modeling

Towards Learning Controllable Representations of Physical Systems

Learning to Represent Mechanics via Long-term Extrapolation and Interpolation

Hybrid Physics and Deep Learning Model for Interpretable Vehicle State Prediction

Interpretable machine learning models: a physics-based view

Neural World Models for Computer Vision

Learning Interpretable Dynamics from Images of a Freely Rotating 3D Rigid Body

From latent dynamics to meaningful representations

Understanding Physical Dynamics with Counterfactual World Modeling

Learning Physical Constraints with Neural Projections

Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems

Extrapolation of Physics-Inspired Deep Networks in Learning Robot Inverse Dynamics