Quantifying Regional Error in Surrogates by Modeling Its Relationship with Sample Density

Souma Chowdhury,Ali Mehmani,Jie Zhang,Achille Messac
DOI: https://doi.org/10.2514/6.2013-1751
2013-01-01
Abstract:Approximation models (or surrogate models) provide an efficient substitute to expensive physical simulations and an efficient solution to the lack of physical models of system behavior. However, it is challenging to quantify the accuracy and reliability of such approximation models in a region of interest or the overall domain without additional system evaluations. Standard error measures, such as the mean squared error, the cross-validation error, and the Akaikes information criterion, provide limited (often inadequate) information regarding the accuracy of the final surrogate. This paper introduces a novel and model independent concept to quantify the level of errors in the function value estimated by the final surrogate in any given region of the design domain. This method is called the Regional Error Estimation of Surrogate (REES). Assuming the full set of available sample points to be fixed, intermediate surrogates are iteratively constructed over a sample set comprising all samples outside the region of interest and heuristic subsets of samples inside the region of interest (i.e., intermediate training points). The intermediate surrogate is tested over the remaining sample points inside the region of interest (i.e., intermediate test points). The fraction of sample points inside region of interest, which are used as intermediate training points, is fixed at each iteration, with the total number of iterations being pre-specified. The estimated median and maximum relative errors within the region of interest for the heuristic subsets at each iteration are used to fit a distribution of the median and maximum error, respectively. The estimated statistical mode of the median and the maximum error, and the absolute maximum error are then represented as functions of the density of intermediate training points, using regression models. The regression models are then used to predict the expected median and maximum regional errors when all the sample points are used as training points. Standard test functions and a wind farm power generation problem are used to illustrate the effectiveness and the utility of such a regional error quantification method.
What problem does this paper attempt to address?