Prophet: A Parallel Instruction-Oriented Many-Core Simulator
Weihua Zhang,Xiaofeng Ji,Yunping Lu,Haojun Wang,Haibo Chen,Pen-Chung Yew
DOI: https://doi.org/10.1109/tpds.2017.2700307
IF: 5.3
2017-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Most existing computer architecture simulators are cycle oriented, i.e., they are driven cycle by cycle. However, frequent switches among simulation contexts, excessive buffer accesses and tightly coupled manner often make such an architecture simulator slow, difficult to parallelize and hard to scale to large-scale many-core systems. In this paper, we propose Prophet, a parallel instruction-oriented simulation framework for many-cores. Prophet adopts a general instruction-oriented model to simulate processor cores, in which a simulator is built from the perspective of each simulated instruction impacting a small number of relevant processor components, as opposed to that of a large number of processor components executing many instructions in each cycle as in the cycle-oriented approach. Prophet determines the execution cycle of a simulated instruction based on the states of the relevant components impacted by the instruction, and update the components states after the execution of the instruction. Prophet also adopts a speculative model to decouple private resources from the shared resources (e.g., shared cache), which avoids unnecessary interactions between them and only pays a penalty upon a rare mis-speculation. We have designed and implemented a prototype of Prophet that supports both user-level and full-system simulation. Experimental results show Prophet can scale up to simulate thousands of simulated cores (4,096 cores in the current implementation) with good performance and small accuracy loss. It achieves average simulation speeds of about 98 and 235 MIPS (millions of simulated instructions per second) for full-system and user-level simulation, respectively, with only 3 percent IPC error rate and negligible deviation in cache simulation results. When run on a many-core platform (i.e., Intel Xeon Phi), it achieved an average simulation speed of about 413 MIPS.