Optimizing the performance of Lattice Gauge Theory simulations with Streaming SIMD extensions

Shyam Srinivasan
DOI: https://doi.org/10.48550/arXiv.1309.0551
2013-09-02
Computational Engineering, Finance, and Science
Abstract:Two factors, which affect simulation quality are the amount of computing power and implementation. The Streaming SIMD (single instruction multiple data) extensions (SSE) present a technique for influencing both by exploiting the processor's parallel functionalism. In this paper, we show how SSE improves performance of lattice gauge theory simulations. We identified two significant trends through an analysis of data from various runs. The speed-ups were higher for single precision than double precision floating point numbers. Notably, though the use of SSE significantly improved simulation time, it did not deliver the theoretical maximum. There are a number of reasons for this: architectural constraints imposed by the FSB speed, the spatial and temporal patterns of data retrieval, ratio of computational to non-computational instructions, and the need to interleave miscellaneous instructions with computational instructions. We present a model for analyzing the SSE performance, which could help factor in the bottlenecks or weaknesses in the implementation, the computing architecture, and the mapping of software to the computing substrate while evaluating the improvement in efficiency. The model or framework would be useful in evaluating the use of other computational frameworks, and in predicting the benefits that can be derived from future hardware or architectural improvements.
What problem does this paper attempt to address?