OpenMP Performance Analysis of CFD Application on Intel Multicore and Manycore Architectures

che yonggang,zhang lilun,wang yongxian,xu chuanfu,cheng xinghua
DOI: https://doi.org/10.3778/j.issn.1673-9418.1412057
2015-01-01
Abstract:Multicore and manycore are becoming mainstream architectures in high performance computing. Open MP programming is one of the primary methods to exploit the parallel computing capabilities of them. By using a systematic approach which incorporates hardware performance counter based measurement and model based analysis,this paper evaluates the Open MP performance of a real-world high order structured grids based CFD(computational fluids dynamics) application on Xeon E5 Sandy Bridge, an Intel multicore processor, and Knights Corner, an Intel many integrated core coprocessor. This paper analyzes the performance impacts of the Open MP library cost, the load balance among different Open MP threads, and the memory bandwidth to the application. The results show that the redundant computation introduced by Open MP parallel programming is not significant. The serial portion and the load imbalance significantly affect the parallel efficiency. And memory access bandwidth significantly affects the achieved floating point performance. This paper also compares the performance differences between two architectures and discusses the directions of further performance tuning.
What problem does this paper attempt to address?