Abstract:Sparse data-driven approaches enable the approximation of governing laws of physical processes with parsimonious equations. While a great effort over the last decade has been made in this field, data-driven approaches generally rely on the paradigm of imposing a fixed base of library functions. In order to promote sparsity, finding the optimal set of basis functions is a necessary condition but a challenging task to guess in advance.Here, we propose an alternative approach which consists of optimizing the very library of functions while imposing sparsity. The robustness of our results is not only evaluated by the quality of the fit of the discovered model, but also by the statistical distribution of the residuals with respect to the original noise in the data. In order to avoid to choose one metric over the other, we rather rely on a multi-objective genetic algorithm (NSGA-II) for systematically generating a subset of optimal models sorted in a Pareto front. We illustrate how this method can be used as a tool to derive microkinetic equations from experimental data, and as a kernel approach for design of experiments.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the optimization problems of sparse dynamical systems. Specifically, the author proposes a method based on Pareto optimization to optimize the function library of sparse dynamical systems while maintaining the sparsity of the model. Traditional data - driven methods usually rely on a fixed basic function library, which poses challenges in finding the optimal set of basic functions. The method proposed in this paper systematically generates a set of optimal models through the multi - objective genetic algorithm (NSGA - II) and ranks them on the Pareto front. This method not only evaluates the fitting quality of the discovered model to the original data, but also evaluates the statistical distribution of the residuals relative to the original noise, thereby improving the robustness of the model. ### Main contributions 1. **Optimizing the function library**: A method for optimizing the function library is proposed instead of using a fixed function library. This enables the model to better adapt to the nonlinear characteristics of complex dynamical systems. 2. **Multi - objective optimization**: The multi - objective genetic algorithm (NSGA - II) is used to generate a set of optimal models that perform well on multiple evaluation metrics. 3. **Robustness evaluation**: Not only the fitting quality of the model is evaluated, but also the statistical distribution of the residuals is evaluated to ensure the robustness of the model. 4. **Application examples**: The effectiveness of the method is demonstrated through two case studies: - **Goodwin model**: A dynamic network with a negative feedback loop is reconstructed, and the negative feedback intensity is accurately identified. - **Chemical reactor**: The model is identified from the transient chemical reaction data, and the exponential term in the reaction rate equation is successfully fitted. ### Mathematical background This method is based on sparse regression theory and considers the form of an autonomous dynamical system as: \[ \dot{x}=v(x) \] where \(x\in\mathbb{R}^n\) is a vector containing state variables, and \(v\in\mathbb{R}^n\) is a velocity field, representing the law of change of \(x\) over time. From a data - driven perspective, the vector \(x\) is the space of measured variables, and the multi - valued function \(v(x)\) is the control law (functions and coefficients) that we hope to discover, which describes the point - like distribution of the vector field in the phase space. The core idea of sparse symbolic regression theory is that for many dynamical systems, the control law vector consists of only a few terms. Therefore, the goal of the sparse regression algorithm is to maintain the sparsity of the model while retaining the relevant terms. Specifically, the following system needs to be solved: \[ \dot{x}=\Theta(x)\Xi \] where \(\dot{x}\) is obtained by numerically differentiating the measured variable \(x\), \(\Theta\) is the matrix of candidate functions evaluated at all time points \(t\), and \(\Xi\) is the coefficient matrix. Solve \(\Xi\) through the sparse regression algorithm to retain the relevant terms in \(\Theta\). ### Case studies 1. **Goodwin model**: - **Dynamical equations**: \[ \begin{aligned} \dot{x}&=\frac{k_0}{1 + z^\beta}-k_1x\\ \dot{y}&=k_2x - k_3y\\ \dot{z}&=k_4y - k_5z \end{aligned} \] - **Results**: Through optimization, the negative feedback intensity \(\beta\) is successfully identified and is highly consistent with the true value. 2. **Chemical reactor**: - **Dynamical equations**: \[ \begin{aligned} \dot{x}&=\frac{1}{u}(-\alpha k_{12}x^\alpha y^\beta)\\ \dot{y}&=\frac{1}{u}(-\beta k_{12}x^\alpha y^\beta)\\ \dot{z}&=\f

Pareto-based Optimization of Sparse Dynamical Systems

Discovering governing equations from data: Sparse identification of nonlinear dynamical systems

Evolutionary sparse data-driven discovery of multibody system dynamics

Discovering governing equations from data by sparse identification of nonlinear dynamical systems

Physics-informed learning of governing equations from scarce data

Learning sparse nonlinear dynamics via mixed-integer optimization

Data2Dynamics: a Modeling Environment Tailored to Parameter Estimation in Dynamical Systems.

Lessons Learned from Quantitative Dynamical Modeling in Systems Biology

Dynamical Modeling for Non-Gaussian Data with High-Dimensional Sparse Ordinary Differential Equations

Stochastic Optimization of Large-Scale Parametrized Dynamical Systems

Sparse decompositions of nonlinear dynamical systems and applications to moment-sum-of-squares relaxations

Automatically discovering ordinary differential equations from data with sparse regression

Model selection for hybrid dynamical systems via sparse regression

Global Optimization Approach for Parameter Estimation in Stochastic Dynamic Models of Biosystems

A quantum inspired approach to learning dynamical laws from data -- block-sparsity and gauge-mediated weight sharing

Sparse inference and active learning of stochastic differential equations from data

Discovery of differential equations using sparse state and parameter regression

A Boltzmann approach to mean-field sparse feedback control

Exploiting Term Sparsity in Moment-SOS Hierarchy for Dynamical Systems

Inferring biological networks by sparse identification of nonlinear dynamics

Knowledge-based modeling of simulation behavior for Bayesian optimization