Abstract:Multi-label learning has garnered much research interest due to its wide range of real-world applications. Many multi-label learning methods have been proposed; however, few have addressed the class imbalance problem existing in multi-label data. Even though some studies have taken this issue into account, most of them have ignored the label correlations or only considered random correlations between them. In this study, we propose a novel partition-based imbalanced multi-label learning algorithm, named Multi-label Learning based on Hierarchical Clustering (MLHC), to tackle this problem. MLHC first carries out hierarchical clustering on the original label space to divide it into several disconnected subspaces, each of which contains several labels that are strongly correlated with each other. Then, for each label subspace, we use the problem transformation strategy to convert it into a multi-class problem by binary coding. Any multi-class imbalance learning algorithm can be applied to the transformed multi-class data. Finally, the classification results will be decoded to retrieve the corresponding label subspace, and all label subspace results are combined to show the predicted label vector in the original label space. We conducted experiments not only on thirteen benchmark multi-label datasets but also carried out them on XJTU-SY which is a multi-label engineering application dataset, and the results indicated that our proposed MLHC learning algorithm outperforms several state-of-the-art class imbalance multi-label learning algorithms, demonstrating the effectiveness and necessity of discovering label correlations and transforming the original imbalanced multi-label learning problem into multiple strongly correlated multi-class imbalanced learning problems.

Building decision trees for the multi-class imbalance problem

Novel Design of Decision-Tree-Based Support Vector Machines Multi-Class Classifier

A Multi-Class Imbalance Learning Method Based on HDDT Ensemble

Hellinger Distance Trees for Imbalanced Streams

Multi-Class Imbalance Problem: A Multi-Objective Solution

Classification of multiclass imbalanced data using cost-sensitive decision tree C5.0

Learning Imbalanced Multi-class Data with Optimal Dichotomy Weights

Cost-sensitive hierarchical classification for imbalance classes

The return of AdaBoost.MH: multi-class Hamming trees

Unifying Decision Trees Split Criteria Using Tsallis Entropy

Unifying Attribute Splitting Criteria of Decision Trees by Tsallis Entropy.

A partition-based problem transformation algorithm for classifying imbalanced multi-label data

A linear multivariate binary decision tree classifier based on K-means splitting

Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making

Smart Data Driven Decision Trees Ensemble Methodology for Imbalanced Big Data

The Max-Cut Decision Tree: Improving on the Accuracy and Running Time of Decision Trees

Dive into Decision Trees and Forests: A Theoretical Demonstration

Revisiting multi-dimensional classification from a dimension-wise perspective

A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification

Improving Decision Trees by Tsallis Entropy Information Metric Method.

Era Splitting -- Invariant Learning for Decision Trees