Understanding Network Startup for Secure Containers in Multi-Tenant Clouds: Performance, Bottleneck and Optimization
Yunzhuo Liu,Junchen Guo,Bo Jiang,Pengyu Zhang,Xiaoqing Sun,Yang Song,Wei Ren,Zhiyuan Hou,Biao Lyu,Rong Wen,Shunmin Zhu,Xinbing Wang
DOI: https://doi.org/10.1145/3646547.3688436
2024-01-01
Abstract:In this paper, we use empirical measurements to show that container network startup is a key factor that contributes to the slow startup of secure containers in multi-tenant clouds, especially in the scenario of serverless computing, where the issue is pronounced by high-volume concurrent container invocations. We conduct extensive and detailed analysis on existing Container Network Interface (CNI) plugins and show that even the fastest one doubles the startup time from the no-network scenario. We show that the major cause of the blowup in total startup time is that enabling networking significantly increases the contention among different startup stages, particularly for global Linux kernel locks, including the Routing Table NetLink (RTNL) mutex lock and various spin locks. We reveal that contending for these locks hinders startup performance in three ways, including directly increasing stage time, causing poor pipeline overlap and wasting CPU resources. To mitigate such kernel lock contention, we propose a multi-stage concurrency control mechanism based on Bayesian optimization to limit the concurrency of each contended stage. Our results show that this lightweight mechanism can effectively reduce the end-to-end container startup time by 18.8% with negligible extra overhead.