What problem does this paper attempt to address?

The problem that this paper attempts to solve is the security vulnerability caused by GPU local memory leakage, specifically a vulnerability named LeftoverLocals. This vulnerability enables attackers to recover data created by another process from the GPU's local memory, thus undermining the security of GPU applications (especially large - language models and machine - learning models). ### Specific Problem Description 1. **Security Vulnerability**: - The LeftoverLocals vulnerability allows attackers to obtain data from other processes by reading uninitialized GPU local memory. This particularly affects the security of large - language models (LLM) and other machine - learning models running on affected GPUs. - Attackers can listen in and reconstruct another user's interactive LLM session, even if these sessions are carried out across different process or container boundaries. 2. **Scope of Impact**: - This vulnerability affects GPUs of multiple hardware manufacturers, including Apple, Qualcomm, and AMD, etc. NVIDIA's GPUs are not currently affected, possibly because similar problems have been discovered and fixed in previous research. - This vulnerability is especially important in privacy - sensitive application areas (such as machine learning) because these applications usually handle a large amount of sensitive data. 3. **Potential Risks**: - Attackers can obtain sensitive information such as model inputs, outputs, and weights by reading uninitialized local memory, which poses a serious threat to the security of ML systems. - For example, in a 7 - B - parameter LLM model, each query may leak about 181MB of data, which is sufficient to reconstruct the LLM's response with high precision. ### Solutions To address this vulnerability, the paper proposes the following solutions: - **Code Modification**: In all GPU kernels that use local memory, ensure that the memory is cleared (for example, by storing 0) before the kernel ends. Users also need to ensure that the compiler does not optimize out these clearing instructions (for example, by declaring the local memory as `volatile`). - **Hardware and Software Updates**: Cooperate with hardware manufacturers to release firmware and driver updates to fix the vulnerability. For example, AMD, Qualcomm, and Imagination have already begun to take measures to solve this problem. ### Summary This paper reveals a serious GPU local memory leakage vulnerability and shows how this vulnerability can be exploited to steal sensitive data. It emphasizes that in machine - learning and other computationally - intensive applications, the security of the entire development stack must be strictly reviewed, especially at the GPU level.

LeftoverLocals: Listening to LLM Responses Through Leaked GPU Local Memory

Vulnerable GPU Memory Management: Towards Recovering Raw Data from GPU

Whispering Pixels: Exploiting Uninitialized Register Accesses in Modern GPUs

Different is Good

The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems

Leaky DNN: Stealing Deep-Learning Model Secret with GPU Context-Switching Side-Channel

A First Look At Efficient And Secure On-Device LLM Inference Against KV Leakage

RTiL: Real-Time Inference of Large Language Models on Memory-Constrained GPU Devices

CacheOut: Leaking Data on Intel CPUs via Cache Evictions

Fallout: Reading Kernel Writes From User Space

Eternal Sunshine of the Spotless Machine: Protecting Privacy with Ephemeral Channels

Uncovering and Exploiting AMD Speculative Memory Access Predictors for Fun and Profit

Store-to-Leak Forwarding: Leaking Data on Meltdown-resistant CPUs (Updated and Extended Version)

ZombieLoad: Cross-Privilege-Boundary Data Sampling

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching

IOTLB-SC: An Accelerator-Independent Leakage Source in Modern Cloud Systems

Oreo: Protecting ASLR Against Microarchitectural Attacks (Extended Version)

ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching

GPU Side-Channel Attacks are Everywhere: A Survey

Memory Backdoor Attacks on Neural Networks

PMU-Leaker: Performance Monitor Unit-based Realization of Cache Side-Channel Attacks