Unknown Type Streaming Feature Selection Via Maximal Information Coefficient

Peng Zhou,Yunyun Zhang,Yuanting Yan,Shu Zhao
DOI: https://doi.org/10.1109/icdmw58026.2022.00089
2022-01-01
Abstract:Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. Most feature selection methods implicitly assume that we can know the feature type (categorical, numerical, or mixed) before learning, then design corresponding measurements to calculate the correlation between features. However, in practical applications, features may be generated dynamically and arrive one by one over time, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature on the fly, but this is unreasonable and unrealistic. Therefore, this paper firstly studies a practical issue of Unknown Type Streaming Feature Selection and proposes a new method to handle it, named UT-SFS. Extensive experimental results indicate the effectiveness of our new method. UT-SFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.
What problem does this paper attempt to address?