A New Online Field Feature Selection Algorithm Based on Streaming Data.

Zhenjiang Zhang,Fuxing Song,Peng Zhang,Han-Chieh Chao,Yingsi Zhao
DOI: https://doi.org/10.1007/s12652-018-0959-0
IF: 3.662
2018-01-01
Journal of Ambient Intelligence and Humanized Computing
Abstract:The rapid development of Internet technology derived out a massive network text data. Therefore, how to classify the massive text data efficiently has important theoretical significance and application value. In order to acquire accurate classification results, the process has been divided into two parts. In terms of text representation, this paper proposes an online field feature selection algorithm (OFFS algorithm) based on streaming data, which solves the problems of low efficiency and memory consumption of traditional feature selection algorithms. With improvements in the vector space model, the new algorithm can select the real-time feature of the data and quickly generate text vector. In the aspect of classifier design, an OFFS-BP neural network text classifier based on BP neutral network and OFFS algorithm is designed. It adapts to the distributed parallel computing, reduces the training time and balances the computation efficiency and classification accuracy. Finally based on the Spark platform, the OFFS-BP neural network classifier is implemented. The experimental results show that the OFFS-BP neural network classifier is more suitable for big data environment with less computation time and higher classification efficiency.
What problem does this paper attempt to address?