GPU Accelerated AES Algorithm

Canhui Wang,Xiaowen Chu
DOI: https://doi.org/10.48550/arXiv.1902.05234
2019-02-14
Abstract:It has been widely accepted that Graphics Processing Units (GPU) is one of promising schemes for encryption acceleration, in particular, the support of complex mathematical calculations such as integer and logical operations makes the implementation easier; however, complexes such as parallel granularity, memory allocation still imposes a burden on real world implementations. In this paper, we propose a new approach for Advanced Encryption Standard accelerations, including both encryption and decryption. Specifically, we adapt the Electronic Code Book mode for cryptographic transformation, look up table scheme for fast lookup, and a granularity of one state per thread for thread scheduling. Our experimental results offer researchers a good understanding on GPU architectures and software accelerations. In addition, both our source code and experimental results are freely available.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use Graphics Processing Unit (GPU) to accelerate the encryption and decryption processes of the Advanced Encryption Standard (AES) algorithm. Specifically, the paper focuses on improving the execution efficiency of the AES algorithm on the GPU through technical means such as optimizing parallel granularity, memory allocation, and fast look - up tables (T - boxes). ### Main problems 1. **Encryption acceleration**: - The traditional CPU has limited performance when processing large - scale data encryption, while the GPU has significant advantages in handling such tasks due to its highly parallel architecture. - The paper proposes a GPU - based AES encryption and decryption benchmarking method to evaluate and optimize the performance of the GPU in encryption tasks. 2. **Parallel granularity optimization**: - The choice of parallel granularity has an important impact on GPU performance. The paper selects the parallel granularity strategy of "one state corresponding to one thread", so that each thread independently processes an AES state, thereby avoiding inter - thread dependencies and increasing parallelism. 3. **Memory allocation and T - boxes optimization**: - T - boxes are tables for fast look - up, which can accelerate the SubBytes transformation in AES. The paper loads T - boxes into CUDA's constant memory to reduce memory access latency and improve look - up speed. - At the same time, the paper explores the impact of different memory allocation strategies (such as constant memory, shared memory, and registers) on performance. 4. **Experimental verification**: - The paper experimentally compares the AES encryption and decryption performance of the CPU and GPU under different file sizes. The results show that when the file size exceeds 4KB, the GPU's performance is significantly better than that of the CPU, especially when processing large files, the GPU's performance improvement is more obvious. ### Summary The main purpose of the paper is to fully utilize the highly parallel architecture of the GPU to accelerate the encryption and decryption processes of the AES algorithm by optimizing parallel granularity, memory allocation, and T - boxes look - up tables. The experimental results show that for large - file encryption and decryption tasks, the GPU has a significant performance advantage over the CPU.