Class-Distributed Learning for Multinomial Logistic Regression with High Dimensional Features and a Large Number of Classes

Shuyuan Wu,Jing Zhou,Ke Xu,Hansheng Wang
DOI: https://doi.org/10.1080/10618600.2024.2362230
2024-07-23
Journal of Computational and Graphical Statistics
Abstract:Estimating a high-dimensional multinomial logistic regression model with a larger number of categories is of fundamental importance but it presents two challenges. Computationally, it leads to heavy computation cost. Statistically, it suffers unsatisfactory statistical efficiency. Therefore, how to solve this problem in a computationally and statistically efficient way is of great interest. To tackle these challenges, we have developed a new class-distributed learning algorithm with a rank-reducible coefficient structure. The key innovation here is piecing together two important techniques for distributed computing and improved statistical efficiency. The two techniques are, respectively, dimension reduction and a circular-structured working model. Dimension reduction effectively alleviates the curse of dimensionality due to high dimensional features. A circular-structured working model allows the use of a class-distributed algorithm for distributed computing. To support our new methodology, we develop rigorous asymptotic theory and present extensive numerical experiments. Supplementary materials for this article are available online.
statistics & probability
What problem does this paper attempt to address?