Ensemble with Divisive Bagging for Feature Selection in Big Data

Yousung Park,Tae Yeon Kwon
DOI: https://doi.org/10.1007/s10614-024-10741-y
IF: 1.741
2024-10-25
Computational Economics
Abstract:We introduce Ensemble with Divisive Bagging (EDB), a new feature selection method in linear models, to address the excessive selection of features in big data due to deflated p -values. Extensive simulations show that EDB derives parsimonious models without loss of predictive performance compared to lasso, ridge, elastic-net, LARS, and FS. We also show that EDB estimates feature importance in linear models more accurately compared to Random Forest, XGBoost, and CatBoost. Additionally, we apply EDB to feature selection in models for house prices and loan defaults. Our findings highlight the advantages of EDB: (1) effectively addressing deflated p -values and preventing the inclusion of extraneous features; (2) ensuring unbiased coefficient estimation; (3) adaptability to various models relying on p -value-based inferences; (4) construction of statistically explainable models with feature attribution and importance by preserving inferences based on a linear model and p -values; and (5) allowing application to linear economic models without altering the previous functional form of the model.
economics,mathematics, interdisciplinary applications,management
What problem does this paper attempt to address?