Achieving High-Performance Fault-Tolerant Routing in HyperX Interconnection Networks

Cristóbal Camarero,Alejandro Cano,Carmen Martínez,Ramón Beivide
2024-04-05
Abstract:Interconnection networks are key actors that condition the performance of current large datacenter and supercomputer systems. Both topology and routing are critical aspects that must be carefully considered for a competitive system network design. Moreover, when daily failures are expected, this tandem should exhibit resilience and robustness. Low-diameter networks, including HyperX, are cheaper than typical Fat Trees. But, to be really competitive, they have to employ evolved routing algorithms to both balance traffic and tolerate failures.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?