Post deployment recycling of machine learning models
Harsh Patel,Bram Adams,Ahmed E. Hassan
DOI: https://doi.org/10.1007/s10664-024-10492-2
IF: 3.762
2024-06-16
Empirical Software Engineering
Abstract:Once a Machine Learning (ML) model is deployed, the same model is typically retrained from scratch, either on a scheduled interval or as soon as model drift is detected, to make sure the model reflects current data distributions and performance experiments. As such, once a new model is available, the old model typically is discarded. This paper challenges the notion of older models being useless by showing that old models still have substantial value compared to newly trained models, and by proposing novel post-deployment model recycling techniques that help make informed decisions on which old models to reuse and when to reuse. In an empirical study on eight long-lived Apache projects comprising a total of 84,343 commits, we analyze the performance of five model recycling strategies on three different types of Just-In-Time defect prediction models (Random Forest (RF), Logistic Regression (LR) and Neural Network (NN)). Comparison against traditional model retraining from scratch (RFS) shows that our approach significantly outperforms RFS in terms of recall, g-mean, AUC and F1 by up to a median of , , and , respectively, with the best recycling strategy (Model Stacking) outperforming the baseline in over of the projects. Our recycling strategies provide this performance improvement at the cost of a median of 2x to 6-17x slower time-to-inference compared to RFS, depending on the selected strategy and variant.
computer science, software engineering