Confidential Computing on nVIDIA Hopper GPUs: A Performance Benchmark Study

Jianwei Zhu,Hang Yin,Peng Deng,Aline Almeida,Shunfan Zhou
2024-10-26
Abstract:This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on nVIDIA Hopper GPUs for large language model (LLM) inference tasks. We benchmark the overhead introduced by TEE mode across various LLMs and token lengths, with a particular focus on the bottleneck caused by CPU-GPU data transfers via PCIe. Our results indicate that while there is minimal computational overhead within the GPU, the overall performance penalty is primarily attributable to data transfer. For the majority of typical LLM queries, the overhead remains below 7%, with larger models and longer sequences experiencing nearly zero overhead.
Distributed, Parallel, and Cluster Computing,Artificial Intelligence,Performance
What problem does this paper attempt to address?
The paper attempts to address the issue of performance impact on nVIDIA Hopper GPU when performing large language model (LLM) inference tasks after enabling the Trusted Execution Environment (TEE). Specifically, the paper studies the following points through benchmarking: 1. **Performance Overhead**: Quantifying the performance overhead after enabling TEE mode, especially the performance under different models and token lengths. 2. **Bottleneck Analysis**: Identifying the main sources of performance overhead, particularly whether data transfer between CPU and GPU (via PCIe) becomes a bottleneck. 3. **Optimization Conditions**: Exploring under what conditions the performance overhead can be minimized. Through these studies, the paper aims to provide a reference for adopting TEE technology in AI applications with high security requirements, helping to understand its performance trade-offs in practical applications.