Reconfigurable Aggregation Tree for Distributed Machine Learning in Optical WAN

Ling Liu,Hongfang Yu,Gang Sun
DOI: https://doi.org/10.1109/ICAML54311.2021.00051
2021-01-01
Abstract:The scarce WAN bandwidth can hinder the training of Geo-distributed machine learning (Geo-distributed ML). In this paper, we study the aggregation tree that is commonly used to reduce communication overhead during model synchronization, with considering the reconfigurable optical WAN, and propose Otree which reconfigures the wavelengths on each edge of the tree by borrowing wavelengths from adjacent edges. The simulation results show that Otree achieves 22.5%~ 54.77% lower global model synchronization time than the traditional aggregation tree without considering optical-layer control, and thus reducing the training time.
What problem does this paper attempt to address?