Feature Selection on Data Stream Via Multi-Cluster Structure Preservation

Rui Ma,Yijie Wang,Li Cheng
DOI: https://doi.org/10.1145/3340531.3411928
2020-01-01
Abstract:The modern data arrive continuously in a rapid and time-varying stream, which appears to generate unstable associations on the data structure. However, most of the existing methods focus on dealing with the static data, and they cannot fully take them into the structure construction. To address this issue, we propose an online unsupervised Feature Selection method via Multi-Cluster structure Preservation (FSMCP for short). FSMCP weighs all features by minimizing the differences between the Multi-Cluster structures in the original and the selected feature space. The structure integrates the three-level associations, i.e., the individual-level associations, the aggregation-level associations, and the streaming-level associations. To provide informative features in time, FSMCP check and update the associations as soon as new instances arrive. In comparison with the baseline methods, FSMCP holds better efficiency than offline methods, while still providing almost similar or even better quantitative feature subsets. It outperforms the existing online methods with average NMI improvement of 10.33%.
What problem does this paper attempt to address?