Biclustering-based multi-label classification

Luiz Rafael Schmitke,Emerson Cabrera Paraiso,Julio Cesar Nievola
DOI: https://doi.org/10.1007/s10115-024-02109-3
IF: 2.7
2024-04-24
Knowledge and Information Systems
Abstract:In multi-label classification, data can have multiple labels simultaneously. Two approaches to this issue are either transforming the multi-label data or adapting single-label algorithms for multi-label data. Despite the problem transformation's effectiveness, some algorithms use fixed parameters to determine the number of subproblems, and the label relationships maintenance is done without using correlation or co-occurrence measures. In this work, the approach that converts multi-label problems into multiple binary subproblems was chosen because this offers a low execution time, enabling the use of complex single-label algorithms during classification. However, it has low performance in multi-label metrics. Thus, the BicbPT algorithm is introduced, which uses the biclustering technique combined with the multi-label to binary problem transformation to improve performance in multi-label metrics without increasing this transformation's running time. For the evaluation, comparisons were made with the algorithms BR, CC, ECC, RAkEL and LP. Single-label algorithms SVM, C4.5 and Naive Bayes were applied to classify the binary subproblems across 12 datasets. The experiments demonstrate that BicbPT performed better in the multi-label metrics than the other multi-label to binary algorithms, being similar only to ECC. Still, the running time is up to 10 times higher in ECC, which makes the BicbPT better. Also, it keeps running time similar to algorithms in the multi-label to binary category. Finally, during the experiments, it was possible to perceive that the way the labels influence each other allow to improve the multi-label classification and not only consider maintaining the relationships like other approaches do.
computer science, information systems, artificial intelligence
What problem does this paper attempt to address?