Exploiting Kubernetes' Image Pull Implementation to Deny Node Availability

Luis Augusto Dias Knob,Matteo Franzil,Domenico Siracusa
2024-01-19
Abstract:Kubernetes (K8s) has grown in popularity over the past few years to become the de-facto standard for container orchestration in cloud-native environments. While research is not new to topics such as containerization and access control security, the Application Programming Interface (API) interactions between K8s and its runtime interfaces have not been studied thoroughly. In particular, the CRI-API is responsible for abstracting the container runtime, managing the creation and lifecycle of containers along with the downloads of the respective images. However, this decoupling of concerns and the abstraction of the container runtime renders K8s unaware of the status of the downloading process of the container images, obstructing the monitoring of the resources allocated to such process. In this paper, we discuss how this lack of status information can be exploited as a Denial of Service attack in a K8s cluster. We show that such attacks can generate up to 95% average CPU usage, prevent downloading new container images, and increase I/O and network usage for a potentially unlimited amount of time. Finally, we propose two possible mitigation strategies: one, implemented as a stopgap solution, and another, requiring more radical architectural changes in the relationship between K8s and the CRI-API.
Cryptography and Security
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to reveal and address a resource-exhausting Denial of Service (DoS) attack issue in Kubernetes (K8s) clusters. Specifically, the paper explores design flaws in the Kubernetes Container Runtime Interface (CRI-API) that allow attackers to launch DoS attacks through the following methods: 1. **Exploiting Asynchronous Communication Mechanism**: The asynchronous communication mechanism between Kubernetes' API and the container runtime prevents K8s from timely obtaining the status information of container image downloads. Attackers can exploit this by repeatedly creating and deleting Pods to trigger a large number of image download requests, thereby consuming the node's CPU, disk, and network resources. 2. **Resource Wastage**: When a Pod is deleted, if its image has not been fully downloaded, K8s does not notify the container runtime to stop the download, leading to resource wastage. Attackers can repeat this process to keep the node's resources under high load for extended periods, preventing the normal deployment of other applications. 3. **Cache Manipulation**: Since downloaded images are not immediately deleted, attackers can manipulate the node's cache, increasing the time required for new application deployments. Additionally, the Garbage Collector (GC) may delete some critical application images while cleaning up unused images, causing delays in the execution of these applications. ### Impact of the Attack - **Resource Exhaustion**: The attack leads to significant consumption of the node's CPU, disk, and network resources, affecting the performance of other workloads and potentially violating Service Level Agreements (SLA). - **Deployment Delays**: Due to resource occupation, new containers cannot be deployed in a timely manner, especially for short-lifecycle workloads (such as CI/CD pipelines), resulting in a loss of availability. - **Cache Pollution**: Attackers can pollute the node's cache by downloading a large number of useless images, increasing the deployment time for new applications. ### Solutions The paper proposes two possible mitigation strategies: 1. **Temporary Solution**: Use extended Berkeley Packet Filter (eBPF) technology to intercept and mitigate malicious image download requests. 2. **Long-term Solution**: Make more fundamental architectural adjustments to the relationship between K8s and CRI-API to ensure that the API can timely obtain the status information of image downloads and cancel unfinished download tasks when necessary. ### Summary By thoroughly analyzing the design flaws of the CRI-API in Kubernetes clusters, this paper reveals a new resource-exhausting DoS attack method and proposes corresponding mitigation strategies. These findings help improve the security and stability of Kubernetes, especially in cloud-native environments.