Abstract:We assume that a sufficiently large database is available, where a physical property of interest and a number of associated ruling primitive variables or observables are stored. We introduce and test two machine learning approaches to discover possible groups or combinations of primitive variables: The first approach is based on regression models whereas the second on classification models. The variable group (here referred to as the new effective good variable) can be considered as successfully found, when the physical property of interest is characterized by the following effective invariant behaviour: In the first method, invariance of the group implies invariance of the property up to a given accuracy; in the other method, upon partition of the physical property values into two or more classes, invariance of the group implies invariance of the class. For the sake of illustration, the two methods are successfully applied to two popular empirical correlations describing the convective heat transfer phenomenon and to the Newton's law of universal gravitation.

What problem does this paper attempt to address?

This paper discusses methods for learning effective variables from physical data. The study assumes the existence of a large database containing interested physical properties and their relevant fundamental variables. The paper proposes two machine learning approaches to discover potential variable combinations: one based on regression models and the other based on classification models. When using these methods, if the physical properties exhibit effective invariance, i.e. invariance of the combinations within a given accuracy or invariance of the categories after classification, new effective variables are considered to be found. In the first method, the invariance of combinations implies that the attribute values remain relatively unchanged. In the second method, the invariance of combinations implies the invariance of categories after dividing the physical attribute values into two or more classes. The paper demonstrates the effectiveness of these two methods through the application of popular empirical correlations for heat convection and Newton's law of universal gravitation. Furthermore, the paper discusses how to reduce the set of material descriptors through a multi-objective optimization process to improve classification performance. This approach has been previously applied in the case of superconductors and successfully applied to identify reduced variable sets in photocatalytic microsystems, achieving high-performance combinations as alternatives to expensive components. The paper's methodology includes using regression models to find variable groups and using classification models to achieve optimal variable mixtures for category separation. Experimental results show that these methods can effectively identify simplified descriptions of complex systems and can be extended to more general functional forms. Overall, the aim of this paper is to propose an automated method that can automatically identify key variable combinations for simplified descriptions of physical systems from data, thereby simplifying theoretical modeling and numerical simulation tasks.

Learning effective good variables from physical data

Data-Driven Automated Discovery of Variational Laws Hidden in Physical Systems

Predicting the Effective Thermal Conductivities of Composite Materials and Porous Media by Machine Learning Methods

Machine learning strategies for systems with invariance properties

Data-driven path collective variables

Inherent structural descriptors via machine learning

A unified framework for machine learning collective variables for enhanced sampling simulations: mlcolvar

A unified framework for machine learning collective variables for enhanced sampling simulations: $\texttt{mlcolvar}$

Efficient estimation of material property curves and surfaces via active learning

Statistical Learning with Group Invariance: Problem, Method and Consistency

Scientific intuition inspired by machine learning generated hypotheses

Learning abstract visual concepts via probabilistic program induction in a Language of Thought

Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

On Learning what to Learn: heterogeneous observations of dynamics and establishing (possibly causal) relations among them

Towards understanding and characterizing expert covariational reasoning in physics

Some of the variables, some of the parameters, some of the times, with some physics known: Identification with partial information

Characterizing the invariances of learning algorithms using category theory

Robust data-driven discovery of governing physical laws with error bars

Generalizable Physics-constrained Modeling using Learning and Inference assisted by Feature Space Engineering

Big Variates: Visualizing and identifying key variables in a multivariate world

Robust learning from noisy, incomplete, high-dimensional experimental data via physically constrained symbolic regression