Guest Editorial Introduction to the Special Section on Communication-Efficient Distributed Machine Learning

Xiaowen Chu,Fausto Giunchiglia,Giovanni Neglia,David Gregg,Jiangchuang Liu
DOI: https://doi.org/10.1109/TNSE.2022.3181503
IF: 6.6
2022-01-01
IEEE Transactions on Network Science and Engineering
Abstract:The papers in this special section focus on communication-efficient distributed machine learning. Machine learning, especially deep learning, has been successfully applied in a wealth of practical AI applications in the field of computer vision, natural language processing, healthcare, finance, robotics, etc. With the increasing size of machine learning models and training data sets, training deep learning models requires significant amount of computations and may take days to months on a single GPU or TPU. Therefore, it becomes a common practice to exploit distributed machine learning to accelerate the training process with multiple processors. Distributed machine learning typically requires the processors to exchange information repeatedly throughout the training process. With the fast-growing computing power of the AI processors, the data communications among processors gradually become the performance bottleneck and excessively limit the system scalability due to Amdahl's law. The design of communication-efficient distributed machine learning systems has attracted great attention from both academia and industry.
What problem does this paper attempt to address?