LAMMPS' PPPM Long-Range Solver for the Second Generation Xeon Phi

William McDoniel,Markus Höhnerbach,Rodrigo Canales,Ahmed E. Ismail,Paolo Bientinesi
DOI: https://doi.org/10.48550/arXiv.1702.04250
2017-02-14
Abstract:Molecular Dynamics is an important tool for computational biologists, chemists, and materials scientists, consuming a sizable amount of supercomputing resources. Many of the investigated systems contain charged particles, which can only be simulated accurately using a long-range solver, such as PPPM. We extend the popular LAMMPS molecular dynamics code with an implementation of PPPM particularly suitable for the second generation Intel Xeon Phi. Our main target is the optimization of computational kernels by means of vectorization, and we observe speedups in these kernels of up to 12x. These improvements carry over to LAMMPS users, with overall speedups ranging between 2-3x, without requiring users to retune input parameters. Furthermore, our optimizations make it easier for users to determine optimal input parameters for attaining top performance.
Computational Engineering, Finance, and Science,Distributed, Parallel, and Cluster Computing,Performance
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to optimize the PPPM (Particle - Particle Particle - Mesh method) long - range solver in LAMMPS molecular dynamics simulations on the second - generation Intel Xeon Phi architecture. Specifically, for this particular hardware platform, the researchers optimized the computational kernels through techniques such as vectorization to improve computational efficiency and performance. The main objectives of the paper include: 1. **Optimize computational kernels**: Optimize the computational kernels through vectorization techniques, especially those parts that are not directly supported by highly optimized mathematical libraries. These parts usually account for 20% - 80% of the total PPPM running time, so their optimization is crucial for the overall performance improvement. 2. **Improve overall performance**: Through the above - mentioned optimizations, the researchers achieved a maximum 12 - fold acceleration on these computational kernels, and these improvements increased the overall performance of LAMMPS users by 2 - 3 times without requiring users to readjust the input parameters. 3. **Simplify parameter tuning**: The optimized version makes it easier for users to determine the optimal input parameters to achieve the best performance. The researchers specifically focused on three adjustable parameters: real - space cutoff distance, interpolation order, and differential mode. By optimizing these parameters, some options are superior to others in almost all cases. 4. **Improve energy efficiency**: In addition to performance improvement, the optimized code also improves the energy efficiency of the computation, which is particularly important for large - scale parallel computing. In summary, this paper aims to significantly improve the computational efficiency and performance of the PPPM long - range solver in LAMMPS through optimizations for the second - generation Intel Xeon Phi architecture, while simplifying the user's parameter tuning process to make it more user - friendly and efficient.