Procrastination Is All You Need: Exponent Indexed Accumulators for Floating Point, Posits and Logarithmic Numbers

Vincenzo Liguori
2024-06-10
Abstract:This paper discusses a simple and effective method for the summation of long sequences of floating point numbers. The method comprises two phases: an accumulation phase where the mantissas of the floating point numbers are added to accumulators indexed by the exponents and a reconstruction phase where the actual summation result is finalised. Various architectural details are given for both FPGAs and ASICs including fusing the operation with a multiplier, creating efficient MACs. Some results are presented for FPGAs, including a tensor core capable of multiplying and accumulating two 4x4 matrices of bfloat16 values every clock cycle using ~6,400 LUTs + 64 DSP48 in AMD FPGAs at 700+ MHz. The method is then extended to posits and logarithmic numbers.
Computer Vision and Pattern Recognition,Artificial Intelligence,Hardware Architecture
What problem does this paper attempt to address?
The paper aims to address the problem of summing long sequences of floating-point numbers. Specifically, it proposes a simple and effective method to handle the addition of a large number of floating-point numbers. This method is divided into two stages: the accumulation stage, where all mantissas with the same exponent are partially summed; and the reconstruction stage, where the total sum is derived from these partial sums. Additionally, the paper extends this method to Posits (a variable precision floating-point number representation) and logarithmic number representation. The paper points out that summing long sequences of floating-point numbers is an extremely important operation in fields such as computational science, convolutional neural networks, and large language models. Efficiently performing this operation is crucial for improving overall performance. To achieve this goal, the paper details hardware implementation schemes and explores optimization designs in FPGA and ASIC. Furthermore, it discusses the fusion of multiply-accumulate operations and application examples in different formats, such as the design of Tensor Cores. Finally, the paper investigates the advantages of logarithmic number representation in low-bit applications like neural networks and its compression effects.