Implementation and Evaluation of Different Parallel Designs of AES Using CUDA

Jianwei Ma,Xiaojun Chen,Rui Xu,Jinqiao Shi
DOI: https://doi.org/10.1109/dsc.2017.19
2017-01-01
Abstract:The Advanced Encryption Standard(AES) is used in security areas widely now. However, there is still a large room for further improvement of its execution efficiency. Since the graphics processing unit(GPU) with potent ability of parallel computing has been applied in general purpose of computation, people have tried to use it to faster execution time in various cryptographic algorithms. This paper discusses how the performance of CBC-AES decryption based on GPU is influenced by 4 key parameters that include the size of input data, the number of threads per block, memory allocation style and parallel granularity. Further more, we compare the performance of AES on GPU to that of standard AES, AES-NI and find that when the size of input data is different, the implementations with different parameters setting achieve the best performance. So we provide several advices about how to implement CBC-AES on GPU aiming at different size of input data. In particular, our best performance of experiments on GPU(NVIDIA Tesla K40m) is about 112 times faster than the implementation of AES on CPU (Intel Xeon E5-2650) by using our optimization method.
What problem does this paper attempt to address?