Abstract:To afford the huge computational cost, large-scale deep neural networks (DNN) are usually trained on the distributed system, especially the widely-used parameter server architecture, consisting of a parameter server as well as multiple local workers with powerful GPU cards. During the training, local workers frequently pull the global model and push their computed gradients from/to the parameter server. Due to the limited bandwidth, such frequent communication will cause severe bottleneck for the training acceleration. As recent attempts to address this problem, quantization methods have been proposed to compress the gradients for efficient communication. However, such methods overlook the effects of compression on the model performance such that they either suffer from a low compression ratio or an accuracy drop. In this paper, to better address this problem, we investigate the distributed deep learning as a multi-agent system (MAS) problem. Specifically, 1) local workers and the parameter server are separate agents in the system; 2) the objective of these agents is to maximize the efficacy of the learned model through their cooperative interactions; 3) the strategy of the agents describes how they take actions, i.e. communicate their computed gradients or the global model; 4) rational agents always select the best-response strategy with the optimal utility. Inspired by this, we design a MAS approach for distributed training of DNN. In our method, the agents first estimate the utility (i.e., the benefit to help improve the model) of each action (i.e., transferring a subset of the gradients or the global model), and then take the best-response strategy based on their estimated utilities mixed with e-random exploration. We call our new method Slim-DP as it, being different from the standard data-parallelism, only communicates a subset of the gradient or the global model. Our experimental results demonstrate that our proposed Slim-DP can reduce more communication cost and achieve better speedup without loss of accuracy than the standard data parallelism and its quantization version.

Joint Model Pruning and Topology Construction for Accelerating Decentralized Machine Learning

MLCNN: Cross-Layer Cooperative Optimization and Accelerator Architecture for Speeding Up Deep Learning Applications

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

ROG: A High Performance and Robust Distributed Training System for Robotic IoT

Decentralized Proactive Model Offloading and Resource Allocation for Split and Federated Learning

Enhancing Decentralized Federated Learning with Model Pruning and Adaptive Communication

Model Pruning-enabled Federated Split Learning for Resource-constrained Devices in Artificial Intelligence Empowered Edge Computing Environment

Joint Model Pruning and Device Selection for Communication-Efficient Federated Edge Learning

Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning

Unity is Power: Semi-Asynchronous Collaborative Training of Large-Scale Models with Structured Pruning in Resource-Limited Clients

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

Joint Model Pruning and Resource Allocation for Wireless Time-triggered Federated Learning

SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization.

DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning

Slim-DP: A Multi-Agent System for Communication-Efficient Distributed Deep Learning

A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks

Real-time topology optimization based on deep learning for moving morphable components

Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models

Accelerating Distributed MoE Training and Inference with Lina