Group-Based Alternating Direction Method of Multipliers for Distributed Linear Classification

Huihui Wang,Yang Gao,Yinghuan Shi,Ruili Wang
DOI: https://doi.org/10.1109/tcyb.2016.2570808
IF: 11.8
2016-01-01
IEEE Transactions on Cybernetics
Abstract:The alternating direction method of multipliers (ADMM) algorithm has been widely employed for distributed machine learning tasks. However, it suffers from several limitations, e.g., a relative low convergence speed, and an expensive time cost. To this end, in this paper, a novel method, namely the group-based ADMM (GADMM), is proposed for distributed linear classification. In particular, to accelerate the convergence speed and improve global consensus, a group layer is first utilized in GADMM to divide all the slave nodes into several groups. Then, all the local variables (from the slave nodes) are gathered in the group layer to generate different group variables. Finally, by using a weighted average method, the group variables are coordinated to update the global variable (from the master node) until the solution of the global problem is reached. According to the theoretical analysis, we found that: 1) GADMM can mathematically converge at the rate $O({1}/{k})$ , where ${k}$ is the number of outer iterations and 2) by using the grouping methods, GADMM can improve the convergence speed compared with the distributed ADMM framework without grouping methods. Moreover, we systematically evaluate GADMM on four publicly available LIBSVM datasets. Compared with disADMM and stochastic dual coordinate ascent with alternating direction method of multipliers-ADMM, for distributed classification, GADMM is able to reduce the number of outer iterations, which leads to faster convergence speed and better global consensus. In particular, the statistical significance test has been experimentally conducted and the results validate that GADMM can significantly save up to 30% of the total time cost (with less than 0.6% accuracy loss) compared with disADMM on large-scale datasets, e.g., webspam and epsilon.
What problem does this paper attempt to address?