A New Traffic Offloading Method with Slow Switching Optical Device in Exascale Computer.

En Shao,Guangming Tan,Zhan Wang,Guojun Yuan,Ninghui Sun
DOI: https://doi.org/10.1109/iccd46524.2019.00025
2019-01-01
ICCD
Abstract:The expected exascale computer will comprise tens of thousands of computing nodes and nearly 5000 interconnected nodes in years to come. Such a large-scale system will represent a milestone in the progress of High-Performance Computing (HPC). The more efficient network hardware, like optoelectronic interconnection and configurable switches, is reforming the traditional architecture of supercomputers. However, the present architecture containing new hardware is not easy to adapt to the dynamically running condition, because the newly-developed hardware is normally unable to effectively improve overall performance. Here, we propose a new accelerated system called Software Defined Network Accelerator (sDNA) for the exascale computer. Inspired by edge forwarding index (EFI), the main contribution of our work is that it presents an extended EFI-based optical interconnection method with slow switching optical device. The optical link is connected by the evaluation of each optical link candidate's traffic offloading revenue. As the supporting method for optical interconnection, sDNA selects the most suitable routing configuration according to the job-schedule information and the prior-knowledge of HPC applications. We tested sDNA in a network simulator and a prototype system for the exascale computer, using both DOE application benchmarks and a real-world communication benchmark. From the result of verification of traffic offloading, we found that our optical interconnection method based on our extended EFI evaluation is not only able to offload the traffic from an electrical link to an optical link but is also able to avoid congestion inherent to electrical link. Furthermore, our experimental results show that sDNA maintains the throughput of more than 80% bandwidth and reduced the communication delay by 10% in our real prototype system and simulator. Together, our sDNA is an ideal candidate for accelerating communication performance of the exascale computer.
What problem does this paper attempt to address?