Distributed Deep Neural Network Training with Important Gradient Filtering, Delayed Update and Static Filtering.

Kairu Li,Yongyu Wu,Jia Tian,Wentao Tian,Zuochang Ye
DOI: https://doi.org/10.1145/3377170.3377245
2019-01-01
Abstract:With the increasing number of computing nodes in current computer clusters, the performance of large-scale deep neural network training is essentially limited by the communicational cost, especially for transferring gradients among nodes during iteration. In this paper, three methods are proposed to reduce the communicational cost: important gradient filtering, delayed update and static filtering. Important gradient filtering algorithm selects the most important gradients to reduce the size of gradients to be transferred and help convergence. While delayed update algorithm significantly reduces the gradient broadcasting time. Static filtering filters the gradient with very small variance. Results show that a combination of the proposed methods achieves 2.91× to 5.58× communication cost reduction in a cluster with inexpensive commodity Gigabit Ethernet interfaces.
What problem does this paper attempt to address?