PUFF: A Passive and Universal Learning-based Framework for Intra-domain Failure Detection

Lianjin Ye,Qing Li,Xudong Zuo,Jingyu Xiao,Yong Jiang,Zhuyun Qi,Chunsheng Zhu
DOI: https://doi.org/10.1109/ipccc51483.2021.9679436
2021-01-01
Abstract:The increasing amount of network devices brings significant improvement to network quality but is inevitably prone to various failures. The frequent occurrence of link failures and node failures in the real-world network, causing packet losses and delay, calls for more accurate and fast detection methods. Existing network failure detection systems focus on probes and end-to-end metrics, but are limited by overhead on bandwidth or storage. Reliance on specific deployment of monitoring systems on devices like hosts also limits the feasibility and compatibility in general network topology, ignoring the potential of transferring monitoring tasks from hosts to switches. In this paper, we propose PUFF, a passive and data-driven network failure detection system based on in-network feature collection in programmable switches and machine learning algorithms. First, PUFF explores the potential use of continuous traffic changes to detect node and link failures instead of end-to-end metrics. Second, PUFF offers a software-based prototype and compares its performance with the latest passive failure detection methods. Evaluation based on simulation on real-world topology shows that PUFF can detect nearly 90% node failures and 80% link failures with less overhead in a shorter time.
What problem does this paper attempt to address?