An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent

Jun Tu,Jia Zhou,Donglin Ren
DOI: https://doi.org/10.1016/j.comcom.2022.09.010
IF: 5.047
2022-11-01
Computer Communications
Abstract:Cyber–Physical Systems (CPS) applications are playing an increasingly important role in our lives, hence the use of centralized distributed machine learning in CPS to secure applications to start widespread use. However, existing centralized distributed machine learning (ML) algorithms have significant shortcomings in CPS scenarios. As a result, its synchronization algorithm has high latency and sensitivity to drop-off, which affects the security of CPS. Therefore, this paper combining the Gossip protocol with Stochastic Gradient Descent (SGD), this paper proposes a communication framework Gossip Ring SGD (GR-SGD) for machine learning. GR-SGD is decentralized and asynchronous, and solves the problem of long communication waiting time. This paper uses the ImageNet data set and the ResNet model to verify the feasibility of the algorithm and compares it with Ring AllReduce and D-PSGD. Moreover, this paper also indicates that some data redundancy can reduce communication overhead and increase system fault tolerance, it can be better applied to CPS and all kinds of machine learning models.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?