Pareto-based Optimization of Sparse Dynamical Systems

Gianmarco Ducci,Maryke Kouyate,Karsten Reuter,Christoph Scheurer
DOI: https://doi.org/10.26434/chemrxiv-2024-jgprs
2024-11-21
Abstract:Sparse data-driven approaches enable the approximation of governing laws of physical processes with parsimonious equations. While a great effort over the last decade has been made in this field, data-driven approaches generally rely on the paradigm of imposing a fixed base of library functions. In order to promote sparsity, finding the optimal set of basis functions is a necessary condition but a challenging task to guess in advance.Here, we propose an alternative approach which consists of optimizing the very library of functions while imposing sparsity. The robustness of our results is not only evaluated by the quality of the fit of the discovered model, but also by the statistical distribution of the residuals with respect to the original noise in the data. In order to avoid to choose one metric over the other, we rather rely on a multi-objective genetic algorithm (NSGA-II) for systematically generating a subset of optimal models sorted in a Pareto front. We illustrate how this method can be used as a tool to derive microkinetic equations from experimental data, and as a kernel approach for design of experiments.
Chemistry
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the optimization problems of sparse dynamical systems. Specifically, the author proposes a method based on Pareto optimization to optimize the function library of sparse dynamical systems while maintaining the sparsity of the model. Traditional data - driven methods usually rely on a fixed basic function library, which poses challenges in finding the optimal set of basic functions. The method proposed in this paper systematically generates a set of optimal models through the multi - objective genetic algorithm (NSGA - II) and ranks them on the Pareto front. This method not only evaluates the fitting quality of the discovered model to the original data, but also evaluates the statistical distribution of the residuals relative to the original noise, thereby improving the robustness of the model. ### Main contributions 1. **Optimizing the function library**: A method for optimizing the function library is proposed instead of using a fixed function library. This enables the model to better adapt to the nonlinear characteristics of complex dynamical systems. 2. **Multi - objective optimization**: The multi - objective genetic algorithm (NSGA - II) is used to generate a set of optimal models that perform well on multiple evaluation metrics. 3. **Robustness evaluation**: Not only the fitting quality of the model is evaluated, but also the statistical distribution of the residuals is evaluated to ensure the robustness of the model. 4. **Application examples**: The effectiveness of the method is demonstrated through two case studies: - **Goodwin model**: A dynamic network with a negative feedback loop is reconstructed, and the negative feedback intensity is accurately identified. - **Chemical reactor**: The model is identified from the transient chemical reaction data, and the exponential term in the reaction rate equation is successfully fitted. ### Mathematical background This method is based on sparse regression theory and considers the form of an autonomous dynamical system as: \[ \dot{x}=v(x) \] where \(x\in\mathbb{R}^n\) is a vector containing state variables, and \(v\in\mathbb{R}^n\) is a velocity field, representing the law of change of \(x\) over time. From a data - driven perspective, the vector \(x\) is the space of measured variables, and the multi - valued function \(v(x)\) is the control law (functions and coefficients) that we hope to discover, which describes the point - like distribution of the vector field in the phase space. The core idea of sparse symbolic regression theory is that for many dynamical systems, the control law vector consists of only a few terms. Therefore, the goal of the sparse regression algorithm is to maintain the sparsity of the model while retaining the relevant terms. Specifically, the following system needs to be solved: \[ \dot{x}=\Theta(x)\Xi \] where \(\dot{x}\) is obtained by numerically differentiating the measured variable \(x\), \(\Theta\) is the matrix of candidate functions evaluated at all time points \(t\), and \(\Xi\) is the coefficient matrix. Solve \(\Xi\) through the sparse regression algorithm to retain the relevant terms in \(\Theta\). ### Case studies 1. **Goodwin model**: - **Dynamical equations**: \[ \begin{aligned} \dot{x}&=\frac{k_0}{1 + z^\beta}-k_1x\\ \dot{y}&=k_2x - k_3y\\ \dot{z}&=k_4y - k_5z \end{aligned} \] - **Results**: Through optimization, the negative feedback intensity \(\beta\) is successfully identified and is highly consistent with the true value. 2. **Chemical reactor**: - **Dynamical equations**: \[ \begin{aligned} \dot{x}&=\frac{1}{u}(-\alpha k_{12}x^\alpha y^\beta)\\ \dot{y}&=\frac{1}{u}(-\beta k_{12}x^\alpha y^\beta)\\ \dot{z}&=\f