A Fast Distributed Classification Algorithm for Large-Scale Imbalanced Data

Huihui Wang,Yang Gao,Yinghuan Shi,Hao Wang
DOI: https://doi.org/10.1109/icdm.2016.0168
2016-01-01
Abstract:The Alternating Direction Method of Multipliers (ADMM) has been developed recently for distributed classification. Nevertheless, the widely-existing class imbalance problem has not been well investigated. Furthermore, previous imbalanced classification methods lack of efforts in studying the complex imbalance problem in a distributed environment. In this paper, we consider the imbalance problem as distributed data imbalance which includes three imbalance issues: (i) within-node class imbalance, (ii)between-node class imbalance, and (iii) between-node structure imbalance. In order to adequately deal with imbalanced data as well as improve time efficiency, a novel distributed Cost-Sensitive classification algorithm via Group-based ADMM (CS-GADMM) is proposed. Briefly, CS-GADMM derives the classification problem as a series of sub-problems with within-node class imbalance. To alleviate the time delay caused by between-node class imbalance, we propose a extension of dual coordinate descent method for the sub-problem optimization. Meanwhile, for between-node structure imbalance, we discreetly study the relationship between local functions, and combine the resulting local variables intra-group to update the global variables for prediction. The experimental results on various imbalanced datasets validate that CS-GADMM could be a efficient algorithm for imbalanced classification.
What problem does this paper attempt to address?