AICAS Grand Challenge 2024: Software and Hardware Co-optimization for General Large Language Model Inference on CPU

Junfeng Tan,Guosheng Yu,Jianing Li,Xiaohan Ma,Fang Bao,Evens Pan,David Bian,Yongfu Li,Yuan Du,Li Du,Bo Li,Wei Mao
DOI: https://doi.org/10.1109/aicas59952.2024.10595886
2024-01-01
Abstract:Large Language Models (LLMs) have attained remarkable achievements in multi-domain tasks. However, LLMs’ performance is limited by hardware conditions due to billions of parameters. It requires highly efficient deployment and software hardware co-optimization such as quantization, pruning and operator fusion methods. Meanwhile, there is an emerging trend that LLMs run on edge devices like Arm-based CPUs. Thus, we organized the 2024 AICAS Grand Challenge on software and hardware co-optimization for general LLMs. In the preliminary round, participating teams deployed LLMs on either GPUs or CPUs to reduce model memory consumption and increase throughput. In the final round, the qualified teams applied different optimization methods to the Arm-based multi-core Yitian 710 CPU to maximize the performance of their model. The top 6 best teams presented their work in the AICAS 2024.
What problem does this paper attempt to address?