LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

Tyler Sorensen,Heidy Khlaaf
2024-01-30
Abstract:This paper describes LeftoverLocals: a vulnerability that allows data recovery from GPU memory created by another process on Apple, Qualcomm, and AMD GPUs. LeftoverLocals impacts the security posture of GPU applications, with particular significance to LLMs and ML models that run on impacted GPUs. By recovering local memory, an optimized GPU memory region, we built a PoC where an attacker can listen into another user's interactive LLM session (e.g., llama.cpp) across process or container boundaries.
Cryptography and Security,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the security vulnerability caused by GPU local memory leakage, specifically a vulnerability named LeftoverLocals. This vulnerability enables attackers to recover data created by another process from the GPU's local memory, thus undermining the security of GPU applications (especially large - language models and machine - learning models). ### Specific Problem Description 1. **Security Vulnerability**: - The LeftoverLocals vulnerability allows attackers to obtain data from other processes by reading uninitialized GPU local memory. This particularly affects the security of large - language models (LLM) and other machine - learning models running on affected GPUs. - Attackers can listen in and reconstruct another user's interactive LLM session, even if these sessions are carried out across different process or container boundaries. 2. **Scope of Impact**: - This vulnerability affects GPUs of multiple hardware manufacturers, including Apple, Qualcomm, and AMD, etc. NVIDIA's GPUs are not currently affected, possibly because similar problems have been discovered and fixed in previous research. - This vulnerability is especially important in privacy - sensitive application areas (such as machine learning) because these applications usually handle a large amount of sensitive data. 3. **Potential Risks**: - Attackers can obtain sensitive information such as model inputs, outputs, and weights by reading uninitialized local memory, which poses a serious threat to the security of ML systems. - For example, in a 7 - B - parameter LLM model, each query may leak about 181MB of data, which is sufficient to reconstruct the LLM's response with high precision. ### Solutions To address this vulnerability, the paper proposes the following solutions: - **Code Modification**: In all GPU kernels that use local memory, ensure that the memory is cleared (for example, by storing 0) before the kernel ends. Users also need to ensure that the compiler does not optimize out these clearing instructions (for example, by declaring the local memory as `volatile`). - **Hardware and Software Updates**: Cooperate with hardware manufacturers to release firmware and driver updates to fix the vulnerability. For example, AMD, Qualcomm, and Imagination have already begun to take measures to solve this problem. ### Summary This paper reveals a serious GPU local memory leakage vulnerability and shows how this vulnerability can be exploited to steal sensitive data. It emphasizes that in machine - learning and other computationally - intensive applications, the security of the entire development stack must be strictly reviewed, especially at the GPU level.