Distributed Information-Theoretic Semisupervised Learning for Multilabel Classification

Zhen Xu,Ying Liu,Chunguang Li
DOI: https://doi.org/10.1109/TCYB.2020.2986463
Abstract:Multilabel classification (MLC) has received much attention recently. The existing MLC algorithms usually learn multiple classifiers simultaneously by exploiting the correlations among different labels. However, it is difficult and/or expensive to collect a large amount of multilabeled data in practice. The lack of labeled data significantly deteriorates the performance of classification. Moreover, the existing algorithms belong to centralized learning, that is, all the data with their labels must be transmitted to a fusion center for processing. But in many real applications, data may be dispersedly collected/stored in distributed nodes of networks. Due to the concerns of communication cost, processing ability, and data privacy, it is impossible to transmit and/or process the data centrally at one node. Considering this, the problem of distributed MLC over networks is studied, and two distributed information-theoretic semisupervised multilabel learning (dITS2ML2) algorithms are proposed, which are, respectively, used for solving linear and nonlinear MLC problems. In the proposed algorithms, a cost-sensitive objective function is designed, in which a new label correlation term defined on some anchor data is suggested. Besides, to decentralize the global objective function, a distributed matrix completion algorithm is developed to distributively complete the label matrix of the anchor data. Then, by exchanging and combining a few intermediate quantities instead of the original data for both linear and nonlinear cases, the model parameters can be adaptively estimated. The convergence of the proposed dITS2ML2 algorithms is analyzed, and their effectiveness in MLC is verified by simulations on various real datasets.
What problem does this paper attempt to address?