Performance Optimization of a CFD Application on Intel Multicore and Manycore Architectures

yonggang che,lilun zhang,yongxian wang,chuanfu xu,wei liu,xinghua cheng
DOI: https://doi.org/10.1007/978-3-662-44491-7_7
2014-01-01
Abstract:This paper reports our experience optimizing the performance of a high-order and high accurate Computational Fluid Dynamics (CFD) application (HOSTA) on the state of art multicore processor and the emerging Intel Many Integrated Core (MIC) coprocessor. We focus on effective loop vectorization and memory access optimization. A series techniques, including data structure transformations, procedure inlining, compiler SIMDization, OpenMP loop collapsing, and the use of Huge Pages, are explored. Detailed execution time and event counts from Performance Monitoring Units are measured. The results show that our optimizations have improved the performance of HOSTA by 1.61× on a two Intel Sandy Bridge processors based computer node and 1.97× on a Intel Knights Corner coprocessor, the public MIC product. The microarchitecture level effects of these optimizations are also discussed.
What problem does this paper attempt to address?