Abstract:Learning ensembles by bagging can substantially improve the generalization performance of low-bias, high-variance estimators, including those evolved by Genetic Programming (GP). To be efficient, modern GP algorithms for evolving (bagging) ensembles typically rely on several (often inter-connected) mechanisms and respective hyper-parameters, ultimately compromising ease of use. In this paper, we provide experimental evidence that such complexity might not be warranted. We show that minor changes to fitness evaluation and selection are sufficient to make a simple and otherwise-traditional GP algorithm evolve ensembles efficiently. The key to our proposal is to exploit the way bagging works to compute, for each individual in the population, multiple fitness values (instead of one) at a cost that is only marginally higher than the one of a normal fitness evaluation. Experimental comparisons on classification and regression tasks taken and reproduced from prior studies show that our algorithm fares very well against state-of-the-art ensemble and non-ensemble GP algorithms. We further provide insights into the proposed approach by (i) scaling the ensemble size, (ii) ablating the changes to selection, (iii) observing the evolvability induced by traditional subtree variation. Code: <a class="link-external link-https" href="https://github.com/marcovirgolin/2SEGP" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to efficiently evolve bagging ensemble models in Genetic Programming (GP) while maintaining the simplicity of the algorithm. Specifically, the author proposes a simple GP algorithm - Simple Simultaneous Ensemble Genetic Programming (2SEGP). By making minor modifications to the fitness evaluation and selection mechanisms, the algorithm can efficiently generate ensemble models without repeating the evolution process multiple times. This solves the problem that the existing Complex Simultaneous Ensemble Learning Algorithms (CSEL - Algs), although efficient, are too complex and difficult to use in practical applications. ### Background and Objectives of the Paper 1. **Bagging Ensemble Learning**: Bagging is an ensemble learning method that improves generalization performance by aggregating the predictions of multiple low - bias, high - variance estimators. These estimators are usually trained on different sub - samples of the training set. 2. **Genetic Programming (GP)**: GP is an evolutionary algorithm used to generate and optimize computer programs. In GP, each individual is usually a tree structure representing a possible solution. 3. **Limitations of Existing Methods**: - **Simple Independent Ensemble Learning Applications (SIEL - Apps)**: By running the classic GP algorithm multiple times, a single estimator is generated each time, and then these estimators are combined into an ensemble model. This method is simple but inefficient. - **Complex Simultaneous Ensemble Learning Algorithms (CSEL - Algs)**: By introducing multiple new mechanisms and hyper - parameters, an ensemble model is generated at once. Although these methods are more efficient, they are highly complex and difficult to use in practical applications. ### Main Contributions of the Paper 1. **2SEGP Algorithm**: By making minor modifications to the fitness evaluation and selection mechanisms, 2SEGP can generate efficient ensemble models in a single evolution process. 2. **Improvement in Fitness Evaluation**: Each individual is evaluated on different training set realizations (i.e., bootstrap samples) to obtain multiple fitness values. The computational cost of these multiple fitness values is only slightly higher than that of the traditional single fitness value. 3. **Improvement in Selection Mechanism**: Through a simple truncation selection method, it is ensured that the population progresses evenly across all bootstrap samples. 4. **Experimental Verification**: Through experiments on classification and regression tasks, it is proven that 2SEGP is comparable or even better in performance to the existing state - of - the - art ensemble and non - ensemble GP algorithms. ### Formula Presentation 1. **Linear Scaling Coefficient**: \[ a=\bar{y}-b\bar{o}, \quad b = \frac{\sum_{i = 1}^{n}(y_{i}-\bar{y})(o_{i}-\bar{o})}{\sum_{i = 1}^{n}(o_{i}-\bar{o})^{2}} \] where \(\bar{y}\) and \(\bar{o}\) represent the arithmetic means of the training set labels and outputs, respectively. 2. **Time Complexity of Fitness Evaluation**: \[ O(n(\ell+\beta)) \] where \(\ell\) represents the size (number of nodes) of an individual, and \(\beta\) represents the number of bootstrap samples. ### Conclusion The paper demonstrates through the 2SEGP algorithm how to efficiently generate bagging ensemble models while maintaining the simplicity of the algorithm. The experimental results show that 2SEGP is comparable in performance to the existing complex ensemble learning algorithms and even performs better in some tasks. This provides a new and more concise method for the application of GP in ensemble learning.

Genetic Programming is Naturally Suited to Evolve Bagging Ensembles

Genetic Ensemble of Extreme Learning Machine

Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data

Evolving Diverse Ensembles Using Genetic Programming for Classification with Unbalanced Data

An ensemble learning interpretation of geometric semantic genetic programming

Evolvability Degeneration in Multi-Objective Genetic Programming for Symbolic Regression

Improving Generalization Ability of Genetic Programming: Comparative Study

An Evolution Strategy Assisted by an Ensemble of Local Gaussian Process Models.

Evolving Ensembles Using Multi-Objective Genetic Programming for Imbalanced Classification

A segregated genetic programming for bioprocess modelling with outliers

Evolutionary bagging for ensemble learning

Genetic Programming for Ensemble Learning in Face Recognition

Exploiting Tournament Selection for Efficient Parallel Genetic Programming

Multiclass Classification on High Dimension and Low Sample Size Data Using Genetic Programming

On Explaining Machine Learning Models by Evolving Crucial and Compact Features

A Novel Surrogate-assisted Evolutionary Algorithm Applied to Partition-based Ensemble Learning

Heterogeneous Ensemble-Based Infill Criterion for Evolutionary Multiobjective Optimization of Expensive Problems

Maximizing the Sharpe Ratio: A Genetic Programming Approach

Evolving Benchmark Functions to Compare Evolutionary Algorithms via Genetic Programming

An Empirical Study of Progressive Insular Cooperative GP

A Survey on Techniques of Improving Generalization Ability of Genetic Programming Solutions