Can Mic Find Its Place in the Field of Pdes? an Early Performance Evaluation of Pdes Simulator on Intel Many Integrated Cores Coprocessor
Huilong Chen,Yiping Yao,Wenjie Tang,Dong Meng,Feng Zhu,Yuewen Fu
DOI: https://doi.org/10.1109/ds-rt.2015.23
2015-01-01
Abstract:The widespread utilization of many-core processors offers a good opportunity for Parallel Discrete Events Simulation (PDES) to obtain a better execution performance. As one of the newly introduced many-core processors, the Intel Xeon Phi coprocessor based on Many Integrated Core (MIC) architecture integrates about 60 optimized x86 cores within a PCB board, reaching a peak performance of 1.0 TFLOPS. Furthermore, benefiting from using x86 architecture cores, the MIC coprocessor is fully compatible with almost all programs designed for general purpose CPUs, which makes it easy to run simulation progress on MIC. There have been many works on performance evaluation and optimization of PDES simulator using Graphic Processing Unit (GPU) or Tilera or other many-core processors, yet almost no related works on Phi are published until now. In this article, an early performance evaluation of the well-known PDES simulator ROSS and its POSIX thread version ROSS-MT was conducted based on a computing node composed of two Intel Xeon multi-core CPUs and one Phi coprocessor, using the classical PDES benchmark PHOLD and its extended version by adding different event granularities. Experiment results show that the pure MPI based ROSS performs poorly on MIC coprocessor, indicating that it would not be feasible for common PDES applications. Though ROSS-MT has a much better performance on MIC, the computation potential of MIC is still hardly fully explored. Furthermore, with the event granularity becomes larger, performance of this benchmark exhibits a "fall of cliff", which turns it into a computation dominant application. However, the entire performance on MIC coprocessor is still worse than that on host. After reasoning the problems, we vectorized the code of event handler to better use the Vector Processing Unit (VPU) of MIC coprocessor, which brings us a peak speedup of 9.7X, showing that MIC coprocessor is able to find its place in the PDES field. At last, according to our evaluation work, we provide some advices to further exploit the power of MIC coprocessor for PDES applications.