Bin.Ini: an Ensemble Approach for Dynamic Data Streams

Muhammad Usman,Huanhuan Chen
DOI: https://doi.org/10.1016/j.eswa.2024.124853
IF: 8.5
2024-01-01
Expert Systems with Applications
Abstract:Class imbalance and concept drifts could deteriorate the performance of classifiers in data stream learning as their co-occurrence presents a complicated learning scenario. This situation becomes more complicated if missing values appear in the data. To tackle this challenge, this paper proposes a novel ensemble method, BINary classification in Incomplete, Non-stationary, and Imbalanced data streams (Bin.INI). As part of Bin.INI, Adapt2Regress builds a linear regression models pool using completely available minority subspaces, monitored under a novel sigmoid-decay function to provide imputation support in a drifting data environment. We use Min++ to balance the data in terms of imbalance complexity, a resampling component proposed earlier. Additionally, the 2-STage Ensemble Pruning technique (2STEP) is proposed which groups the classifiers based upon multiple diversity measures at the first step and selects the best subset using a novel custom metric, Sigmoid-Weighted Imbalance Score, at the second step. By using Adapt2Regress, Min++, and 2STEP, the proposed method can efficiently deal with incomplete, imbalanced, and drifting data in data streams. Experiments are conducted on 350 highly imbalanced data streams containing a variety of concept drifts. 300/350 data streams contain different ratios of missing data (5%–30%) in the minority space. Experiments reveal a noticeable decline in the performance of state-of-the-art approaches when faced with 15% or more missing data, while Bin.INI maintains its performance.
What problem does this paper attempt to address?