Improving Performance of Floating Point Division on GPU and MIC

Kun Huang,Yifeng Chen
DOI: https://doi.org/10.1007/978-3-319-27122-4_48
2015-01-01
Abstract:Floating point computing ability is an important concern in high performance scientific application and engineering computing. Although as a fundamental operation, floating point division or reciprocal has long been much less efficiency compared with addition and multiplication. Architectures like GPU and MIC even have no instruction for such division in the instruction level. This paper proposes a fast approximation algorithm to estimate the division of floating point numbers in IEEE 754 format based on existing instructions which in most cases are accurate enough for practical computing. It consists of a predicting step and an iterating step like most iterative numerical algorithm. The predicting step makes use of the property of IEEE 754 format to calculate estimation by only one integer subtraction instruction. The iterating step improves the accuracy by fast iterations in about ten instructions. This new algorithm is extremely easy to implement and shows a great performance in practical experiments.
What problem does this paper attempt to address?