pystacked: Stacking generalization and machine learning in Stata

Achim Ahrens,Christian B. Hansen,Mark E. Schaffer

DOI: https://doi.org/10.48550/arXiv.2208.10896

2023-03-07

Abstract:pystacked implements stacked generalization (Wolpert, 1992) for regression and binary classification via Python's scikit-learn. Stacking combines multiple supervised machine learners -- the "base" or "level-0" learners -- into a single learner. The currently supported base learners include regularized regression, random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multi-layer perceptron). pystacked can also be used with as a `regular' machine learning program to fit a single base learner and, thus, provides an easy-to-use API for scikit-learn's machine learning algorithms.

Econometrics,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to implement stacked generalization of machine - learning algorithms in Stata, especially for regression and binary - classification tasks. Specifically, the paper introduces a new program named `pystacked`, which allows users to fit multiple machine - learning algorithms through Python's scikit - learn library and combine these algorithms into a final prediction as a weighted average of these individual predictions. The paper points out that when facing new prediction or classification tasks, it is usually very difficult to determine in advance which machine - learning algorithm is the most suitable. Therefore, as a model - averaging method, stacked generalization can provide better performance than any single learner by combining the predictions of multiple learners. `pystacked` not only supports stacked generalization but can also be used as a regular machine - learning program to fit a single base learner, thus providing an easy - to - use API for scikit - learn machine - learning algorithms. The paper emphasizes the uniqueness of `pystacked` in Stata, that is, it is the first tool to introduce stacked generalization into Stata. In addition, `pystacked` also supports multiple base learners, including regularized regression, random forest, gradient - boosted trees, support vector machines, and feed - forward neural networks (multilayer perceptrons). By using cross - validation to avoid overfitting, `pystacked` can effectively combine different types of pattern - recognition capabilities and improve prediction accuracy.

pystacked: Stacking generalization and machine learning in Stata

BO-Stacking: A Novel Shear Strength Prediction Model of RC Beams with Stirrups Based on Bayesian Optimization and Model Stacking

Stacked Generalization: when Does It Work?

A Generalized Stacking for Implementing Ensembles of Gradient Boosting Machines

Issues in Stacked Generalization

StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics

Enhancing binary classification: A new stacking method via leveraging computational geometry

Stacking and stability

Predictive Performance of Bayesian Stacking in Multilevel Education Data

Super learning in the SAS system

Combining Varied Learners for Binary Classification using Stacked Generalization

Model Averaging and Double Machine Learning

Bayesian Geostatistics Using Predictive Stacking

Stacking as Accelerated Gradient Descent

On the Improvement of Predictive Modeling Using Bayesian Stacking and Posterior Predictive Checking

Cost-Sensitive Stacking: an Empirical Evaluation

Systematic Ensemble Learning for Regression

Feature-Weighted Linear Stacking

Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment

Stacking for machine learning redshifts applied to SDSS galaxies

MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels