FLAIR: A Fast and Low-Redundancy Failure Recovery Framework for Inter Data Center Network
Yuchao Zhang,Haoqiang Huang,Ahmed M. Abdelmoniem,Gaoxiong Zeng,Chenyue Zheng,Xirong Que,Wendong Wang,Ke Xu
DOI: https://doi.org/10.1109/tcc.2024.3393735
IF: 5.697
2024-06-08
IEEE Transactions on Cloud Computing
Abstract:Due to the fast developments of 5G and IoT technologies, Inter-Datacenter (Inter-DC) networks are facing unprecedented pressure to duplicate large volumes of geographically distributed user data in a real-time manner. Meanwhile, with the expansion of Inter-DC networks scale, link/node failures also become increasingly frequent, negatively affecting the data transmission efficiency. Therefore, link failure recovery methods become of utmost importance. Many works investigated fast failure recovery, yet none of them consider the deployment overhead of such recovery schemes. While in this article, we found that the side-effect of deploying recovery strategies and the future availability of the recovered transmissions are also crucial for fast recovery. So we propose a fast and low-redundancy failure recovery framework, FLAIR, which consists of a fast recovery strategy FRAVaR and a redundancy removal algorithm ROSE. FRAVaR takes full consideration of deployment overhead by minimizing shuffle traffic. On its base, ROSE regularly eliminates the cumulative rerouting redundancy by removing unnecessary routing updates. The experiment results on 4 realistic network topologies show that FLAIR successfully reduces up to 48.2% deployment overhead compared with the state-of-the-art solutions, and thus reduces up to 70.2% recovery speed and improves up to 36% network utilization.
computer science, information systems, theory & methods