Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation

Qi Chen,Bing Xue,Lin Shang,Mengjie Zhang
DOI: https://doi.org/10.1145/2908812.2908842
2016-01-01
Abstract:Generalisation is one of the most important performance measures for any learning algorithm, no exception to Genetic Programming (GP). A number of works have been devoted to improve the generalisation ability of GP for symbolic regression. Methods based on a reliable estimation of generalisation error of models during evolutionary process are a sensible choice to enhance the generalisation of GP. Structural risk minimisation (SRM), which is based on the VC dimension in the learning theory, provides a powerful framework for estimating the difference between the generalisation error and the empirical error. Despite its solid theoretical foundation and reliability, SRM has seldom been applied to GP. The most important reason is the difficulty in measuring the VC dimension of GP models/programs. This paper introduces SRM, which is based on an empirical method to measure the VC dimension of models, into GP to improve its generalisation performance for symbolic regression. The results of a set of experiments confirm that GP with SRM has a dramatical generalisation gain while evolving more compact/less complex models than standard GP. Further analysis also shows that in most cases, GP with SRM has better generalisation performance than GP with bias-variance decomposition, which is one of the state-of-the-art methods to control overfitting.
What problem does this paper attempt to address?