Performance analysis and optimization of molecular dynamics simulation on Godson-T many-core processor

Liu Peng,Aiichiro Nakano,Guangming Tan,Priya Vashishta,Dongrui Fan,Hao Zhang,Rajiv K. Kalia,Fenglong Song
DOI: https://doi.org/10.1145/2016604.2016643
2011-01-01
Abstract:ABSTRACTMolecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern makes performance optimization a challenge. This paper presents a joint application/architecture study to enhance on-chip parallelism of MD on Godson-T -like many-core architecture. First, a preprocessing leveraging an adaptive divide-and-conquer framework is designed to exploit locality through memory hierarchy with software controlled memory. Then we propose three incremental optimization strategies: (1) a novel data-layout to re-organize linked-list cell data structures to improve data locality; (2) an on-chip locality-aware parallel algorithm to enhance data reuse; and (3) a pipelining algorithm to hide latency to shared memory. Experiments on Godson-T simulator exhibit strong-scaling parallel efficiency 0.99 on 64 cores, which is confirmed by an FPGA emulator. Detailed analysis shows that optimizations utilizing architectural features to maximize data locality and to enhance data reuse benefit scalability most. Furthermore, a simple performance model suggests that the optimization scheme is likely to scale well toward exascale. Certain architectural features are found essential for these optimizations, which could guide future hardware developments.
What problem does this paper attempt to address?