A3: An Automatic Topology-Aware Malfunction Detection and Fixation System in Data Center Networks

Che Zhang,Shiwei Zhang,Bo Jin,Weichao Li,Zhen Wang,Qing Li,Yi Wang
DOI: https://doi.org/10.48550/arXiv.2001.02163
2020-01-08
Abstract:Link failures and cable miswirings are not uncommon in building data center networks, which prevents the existing automatic address configuration methods from functioning correctly. However, accurately detecting such malfunctions is not an easy task because there could be no observable node degree changes. Fixing or correcting such malfunctions is even harder as almost no work can provide accurate fixation suggestions now. To solve the problems, we design and implement A3, an automatic topology-aware malfunction detection and fixation system. A3 innovatively formulates the problem of finding minimal fixation to the problem of computing minimum graph difference (NP-hard) and solves it in O(k^6) and O(k^3) for any less than k/2 and k/4 undirected link malfunctions for FatTree, respectively. Our evaluation demonstrates that for less than k/2 undirected link malfunctions, A3 is 100% accurate for malfunction detection and provides the minimum fixation result. For greater or equal to k/2 undirected link malfunctions, A3 still has accuracy of about 100% and provides the near optimal fixation result.
Networking and Internet Architecture
What problem does this paper attempt to address?