Abstract:With the widespread use of distributed machine learning (DML), many IT companies have established networks dedicated to DML. Different communication architectures of DML have different traffic patterns and different requirements on network performance, which is closely related to network topology. However, traditional network topologies usually pursue general goals and are agnostic to the special communication pattern of the applications. The mismatch between network topology and the applications will directly affect the training performance. Although some studies have analyzed the effect of topology on training performance, the topologies and communication architectures involved are not comprehensive, and it is still not known which topology is appropriate for which communication architecture. This survey investigates typical topologies and analyzes whether they meet the requirements of three commonly used communication architectures (i.e., Parameter Server (PS), Tree and Ring architectures) of DML. Specifically, the topology requirements of each communication architecture and two common topology requirements (i.e., high scalability and fault tolerance) for DML are studied firstly. Next, whether these topologies meet the topology requirements is analyzed. Then, this paper discusses potential technologies and approaches to construct the appropriate scheme for each topology requirement, and then presents DMLNet, a novel network topology that suits the three communication architectures. Finally, several potential directions for future research are outlined.

Topology Description for Data Distributions Using a Topology Graph with Divide-and-combine Learning Strategy.

Learning to Learn Graph Topologies

TopoImb: Toward Topology-level Imbalance in Learning from Graphs

Topology2Vec: Topology Representation Learning for Data Center Networking

Graph Out-of-Distribution Detection Goes Neighborhood Shaping

Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions

Sample Topology Exploration for Label Distribution Learning

Topological Hierarchical Decompositions

Topology-aware Robust Optimization for Out-of-distribution Generalization

Topological data analysis and clustering

Learning a Probabilistic Topology Discovering Model for Scene Categorization.

Characterizing the Influence of Topology on Graph Learning Tasks

Topology Uncertainty Modeling For Imbalanced Node Classification on Graphs

Topological Learning in Multi-Class Data Sets

Topology Learning for Heterogeneous Decentralized Federated Learning Over Unreliable D2D Networks

Topology Construction of Backbone Network Based on Machine Learning

Computational Topology for Data Analysis

Topology-Imbalance Learning for Semi-Supervised Node Classification

A Dual-Graph Attention-Based Approach for Identifying Distribution Network Topology

Capturing Dynamics of Time-Varying Data via Topology