Online Multi-label Streaming Feature Selection with Label Correlation
Dianlong You,Yang Wang,Jiawei Xiao,Yaojin Lin,Maosheng Pan,Zhen Chen,Limin Shen,Xindong Wu
DOI: https://doi.org/10.1109/tkde.2021.3113514
IF: 9.235
2021-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Multi-label streaming feature selection has attracted extensive attention in diverse big data applications. However, most existing works focused on the scenarios where labels are independent, while ignoring the real scenarios that they may be interdependent and correlated with each other. This paper aims to fill this gap by developing a novel online multi-label streaming feature selection scheme by taking into account the existence of label correlation, known as (OMSFSLC). In our design, we first calculate the correlation degree between labels to obtain the label weight. Then, we integrate the mutual information and the label weight to evaluate the correlation between features and labels. In particular, it consists of three stages: 1) online significance analysis, which can determine the significant features via the correlation degree between the newly arriving features and labels; 2) online relevance analysis, which can obtain relevant features via the mutual information; and 3) online redundancy analysis, which can filter the redundant features for removal via pairwise comparison. We implement our solution and conduct extensive experiments on benchmark datasets for performance evaluations. The experimental results exhibit that OMSFSLC significantly outperforms the state-of-the-art methods in terms of effectiveness and efficiency.
computer science, information systems, artificial intelligence,engineering, electrical & electronic