BSSReduce an $O(left|U ight|)$ Incremental Feature Selection Approach for Large-Scale and High-Dimensional Data

Ke Gong,Yong Wang,Maozeng Xu,Zhi Xiao
DOI: https://doi.org/10.1109/tfuzz.2018.2825308
IF: 12.253
2018-01-01
IEEE Transactions on Fuzzy Systems
Abstract:With the advent of the era of big data, data has become bigger than ever. Recently, as a fundamental task of pattern recognition, predict and data mining, feature selection has aroused wide public concern. However, extant methods on feature selection have an $O(left|Cright|^xleft|Uright|^y)$ time complexity, which is the bottleneck preventing people from exploring knowledge in large-scale or high-dimensional datasets. Based on bijective soft sets, we propose a new rationale for feature selection, which can help break that bottleneck. Subsequently, this paper proposes an $O(left|Uright|)$ feature-selection method whose computational time increases linearly only with the number of instances. To validate the proposed method, we conduct extensive experiments on the University of California Irvine (UCI) datasets in which large-scale and high-dimensional datasets containing four million instances and over three million features are included. The results reveal that the proposed method is an efficient, effective, and outperforms traditional methods in runtime, which can save massive computing resources. Moreover, the proposed method can be applied to feature selection for large-scale and gigantic-dimensional datasets, which are difficult to process with traditional methods.
What problem does this paper attempt to address?