Streaming Feature Selection Via Graph Diffusion

Wei Zheng,Shuo Chen,Zhenyong Fu,Jun Li,Jian Yang
DOI: https://doi.org/10.1016/j.ins.2022.10.087
IF: 8.1
2022-01-01
Information Sciences
Abstract:Streaming feature selection for unlabeled data aims to remove redundant and irrelevant features from the continuously arriving features without label information. Most existing methods usually focus on selecting a small set of features that approximately reconstruct each sample in the raw data. However, the real-world streaming data may contain irrelevant features which the current reconstruction strategy cannot effectively exclude. These irrelevant features significantly impair the reliability of the selected feature subset. To address this problem, we introduce a dynamic similarity graph to learn the pairwise sample correlations for adaptively evaluating irrelevant features. By virtue of similarity graph diffusion, the unreliable similarities caused by irrelevant features can be gradually eliminated. The past and current diffused graphs are then used to guide feature selection, thus successfully removing redundant and irrelevant features, respectively. The proposed method consists of two stages: 1) minimum redundancy: accepting only features containing new information based on the past diffused graph; 2) maximum relevance: selecting the most relevant features based on the current diffused graph. Additionally, a compound threshold operator is derived to solve the graph-based learning objective. Extensive experiments on real-world data demonstrate that the proposed method outperforms state-of-the-art unsupervised feature selection methods.
What problem does this paper attempt to address?