A Survey on Multi-Label Feature Selection from Perspectives of Label Fusion.
Wenbin Qian,Jintao Huang,Fankang Xu,Wenhao Shu,Weiping Ding
DOI: https://doi.org/10.1016/j.inffus.2023.101948
IF: 18.6
2023-01-01
Information Fusion
Abstract:With the rapid advancement of big data technology, high-dimensional datasets comprising multi-label data have become prevalent in various fields. However, these datasets often contain more relevant and redundant features, which can adversely affect the performance of machine learning algorithms. Multi-label feature selection (MLFS) has emerged as a crucial pre-processing step in multi-label learning to address this issue. This survey provides an overview of multi-label learning and its algorithms, including problem transformation and algorithm adaptation. We also introduced three traditional strategies for MLFS: filter, wrapper, and embedded-based methods. Furthermore, we categorize existing research on multi-label feature selection into six aspects based on label fusion: label transformation-based (Binary Relevance-based and Label Powerset-based), label correlation-based (second and high-order, high and hybrid order), label specific-based, semi-supervised -learning-based, missing and noisy labels-based, and label enhancement-based approaches. We provide a detailed introduction to each method's common approaches and theories. Additionally, we conduct experimental comparisons on practical multi-label learning datasets to evaluate the advantages and disadvantages of different algorithms. We discuss the application of multi-label feature selection in various domains, such as data mining, computer vision, natural language processing, and bio-informatics. Finally, we outline potential future research directions in multi-label feature selection, including MLFS with online learning, active learning, label distribution learning, partial label learning, granular computing, and class-imbalanced learning.