Communication‐efficient distributed large‐scale sparse multinomial logistic regression

Dajiang Lei,Jie Huang,Hao Chen,Jie Li,Yu Wu
DOI: https://doi.org/10.1002/cpe.6148
2020-12-13
Concurrency and Computation: Practice and Experience
Abstract:<p>Sparse multinomial logistic regression (SMLR) is widely used in image classification and text classification due to its feature selection and probabilistic output. However, the traditional SMLR algorithm cannot satisfy the memory and time needs of big data, which makes it necessary to propose a new distributed solution algorithm. The existing distributed SMLR algorithm has some shortcomings in network strategy and cannot make full use of the computing resources of the current high‐performance cluster. Therefore, we propose communication‐efficient sparse multinomial logistic regression (CESMLR), which adopts the efficient network strategy of each node to solve the SMLR subproblem and achieve a large number of data partitions, taking full advantage of the computing resources of the cluster to achieve an efficient SMLR solution. The big data experimental results show that the performance of our algorithm exceeds those of state‐of‐the‐art algorithms. CESMLR is suitable for processing tasks with high‐dimensional features and consumes less running time while maintaining high classification accuracy.</p>
What problem does this paper attempt to address?