Error modeling for surrogates of dynamical systems using machine learning

Sumeet Trehan,Kevin Carlberg,Louis J. Durlofsky
DOI: https://doi.org/10.48550/arXiv.1701.03240
2017-06-01
Abstract:A machine-learning-based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (e.g., random forests, LASSO) to map a large set of inexpensively computed `error indicators' (i.e., features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed by simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering), and subsequently constructs a `local' regression model to predict the time-instantaneous error within each identified region of feature space. We consider two uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance, and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (e.g., time-integrated errors). We apply the proposed framework to model errors in reduced-order models of nonlinear oil--water subsurface flow simulations. The reduced-order models used in this work entail application of trajectory piecewise linearization with proper orthogonal decomposition. When the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.
Numerical Analysis
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the error - modeling problem introduced when using surrogate models in parameterized dynamic systems. Specifically, the author proposes a machine - learning - based framework to predict the errors caused by these surrogate models, especially the errors on the Quantity of Interest (QoI). This framework uses high - dimensional regression techniques (such as random forests, LASSO, etc.) to map a large number of "error indicators" (i.e., features) generated by the surrogate model at a certain point in time to the prediction of the surrogate model error. This eliminates the need for users to manually select a small number of useful features. This method requires a training set that contains parameter instances through which the surrogate model errors that change over time can be calculated, that is, simultaneously simulating the results of the high - fidelity model and the surrogate model. Using this training data, the method first determines the locality of the regression model (through classification or clustering), and then constructs "local" regression models to predict the instantaneous errors within each identified feature - space region. The main contribution of the paper lies in providing an automated method that does not require manual feature selection to quantify the errors of surrogate models, which is of great significance for improving the accuracy and reliability of surrogate models. In addition, this method is applicable to multiple types of physically - based surrogate models and can be applied in two ways: one is as a correction for the QoI prediction of the surrogate model; the other is for statistically modeling the time - dependent surrogate model errors of arbitrary functions (for example, time - integration errors). The paper demonstrates the effectiveness of this method in reducing errors through numerical experiments, especially when dealing with reduced - order models in nonlinear oil - water subsurface flow simulations.