Iswift: Fast and Accurate Impact Identification for Large-scale CDNs

Jiyan Sun,Tao Lin,Yinlong Liu,Xin Wang,Bo Jiang,Liru Geng,Pengkun Jing,Liang Dai
DOI: https://doi.org/10.1109/iwqos54832.2022.9812890
2022-01-01
Abstract:One key challenge to maintain a large-scale Content Delivery Network (CDN) is to minimize the service downtime when severe system problems happen (e.g., hardware failures). In this case, a critical step is to quickly and accurately identify the range of users with performance degradation, termed impact identification. Successful impact identification not only helps identify impacted users but also provides meaningful information for troubleshooting. However, current practice of impact identification usually takes network engineers several hours to manually identify impacted users, which may lead to a huge business loss. The main challenges for automatic impact identification in large CDNs include the inaccuracy of underlying anomaly detection, huge search space of impact identification and severe long-tail distribution of user traffic. In this paper we propose iSwift, a system that is specifically designed for impact identification in large-scale CDNs in order to address aforementioned challenges. We evaluate the performance of iSwift on semi-synthetic datasets and the results show that iSwift can achieve a F1-score greater than 0.85 within ten seconds, which significantly outperforms state-of-the-art solutions. Furthermore, iSwift has been deployed in a production CDN around one year as a pilot project and demonstrated its online performance confirmed by the network operators.
What problem does this paper attempt to address?