Research on Optimization of Heterogeneous Stencil Computing Based on CPU and GPU
LI Bo,HUANG Dongqiang,JIA Jinfang,WU Li,WANG Xiaoying,HUANG Jianqiang
DOI: https://doi.org/10.19678/j.issn.1000-3428.0064282
2023-01-01
Abstract:As a type of algorithm that uses fixed pattern templates,stencil computing is widely employed in image processing,computational fluid dynamics simulations,and other fields.However,existing stencil computing approaches exhibit problems such as weak computational parallelism,a low cache hit rate,and insufficient utilization of computing resources.Two hybrid computing models-MPI+OpenMP and Compute Unified Device Architecture(CUDA)+OpenMP-are proposed based on the Message Passing Interface(MPI)and Open Multi-Processing(OpenMP) computing models.Unlike the conventional MPI computing model,the MPI+OpenMP model employs MPI for coarse-grained communication between multiple nodes and OpenMP to achieve fine-grained parallel computing throughout the process.Furthermore,it combines Single Instruction Multiple Data(SIMD),Non Uniform Memory Access(NUMA),data prefetching,data partitioning,and other technologies to improve the cache hit rate and parallelization in the stencil computing process,thereby accelerating it.When only CUDA is used for stencil calculation,the CPU’s computing resources are not fully utilized,with a large quantity of them being wasted.In contrast,CUDA+OpenMP allows the CPU to participate in the calculation by splitting the load of computing tasks,reducing communication costs,and making full use of the CPU’s multi-core parallel computing ability.Experimental results show that the average acceleration ratio between the OpenMP+MPI and MPI models is 3.67,whereas that between the CUDA+OpenMP and CUDA models is 1.26. OpenMP+MPI and CUDA+OpenMP exhibit significant improvements in performance.