FPRev: Revealing the Order of Floating-Point Summation by Numerical Testing

Peichen Xie,Yanjie Gao,Jilong Xue
2024-11-01
Abstract:The order of floating-point summation is a key factor in numerical reproducibility. However, this critical information is generally unspecified and unknown for most summation-based functions in numerical libraries, making it challenging to migrate them to new environments reproducibly. This paper presents novel, non-intrusive, testing-based algorithms that can reveal the order of floating-point summation by treating functions as callable black boxes. By constructing well-designed input that can cause the swamping phenomenon of floating-point addition, we can infer the order of summation from the output. We introduce FPRev, a tool that implements these algorithms, and validate its efficiency through extensive experiments with popular numerical libraries on various CPUs and GPUs (including those with Tensor Cores). FPRev reveals the varying summation orders across different libraries and devices, and outperforms other methods in terms of time complexity. The source code of FPRev is at \url{<a class="link-external link-https" href="https://github.com/microsoft/RepDL/tree/main/tools/FPRev" rel="external noopener nofollow">this https URL</a>}.
Numerical Analysis,Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of reproducibility of the floating - point number summation order in numerical calculations. Specifically, the paper focuses on how to reveal the specific order of the summation operation in functions based on floating - point number summation. This order is crucial for the reproducibility of numerical results, but it is usually unspecified and unknown in most numerical libraries, which makes it difficult to ensure the consistency of results when migrating these functions to new environments. #### Background and Problem Description With the rapid development of heterogeneous hardware and diverse software stacks, the non - reproducibility of numerical calculations has become a recognized problem. The same numerical function may produce different results when running on different hardware or updated numerical libraries. This non - reproducibility poses a major challenge to scientific research, software engineering, deep learning, and applications that rely on numerical models for decision - making, affecting the credibility and reliability of results. The floating - point number summation order is one of the main reasons for numerical non - reproducibility. Due to the non - associativity of floating - point addition, the result of the sum depends on the calculation order. For example, in IEEE - 754 binary 64 - bit floating - point numbers (float64), (0.1 + 0.2) + 0.3 ≠ 0.1 + (0.2 + 0.3). However, most numerical libraries do not clearly specify the summation order, so the calculation order may vary in different environments, resulting in inconsistent output results. #### Solution Overview To solve this problem, the paper proposes the FPRev tool, which reveals the floating - point number summation order through a non - invasive, test - based algorithm. FPRev treats the summation function as a callable black box, generates specially designed inputs, and infers the summation order from the output. The main methods include: 1. **Constructing an all - ones mask array**: By introducing large numbers as masks to hide certain summation terms, thereby taking advantage of the swamping phenomenon of floating - point addition. 2. **Inferring order - related information**: By analyzing the output results, determine which summation terms are hidden by the mask, thereby inferring the summation order. 3. **Generating a summation tree**: According to the inferred order information, construct a complete binary tree representing the summation order. #### Main Contributions 1. A new test algorithm with polynomial time complexity is proposed to reveal the floating - point number summation order, which is a significant improvement compared to the naive method with exponential time complexity. 2. The FPRev tool is developed, which can automatically reveal the floating - point number summation order, is helpful for debugging non - reproducible programs, and provides useful migration information. 3. The efficiency of FPRev is verified through extensive experiments, showing its performance on different libraries and devices. 4. For the first time, the summation order of common numerical libraries (such as cuBLAS) is revealed. In summary, this paper is committed to solving the reproducibility problem of the floating - point number summation order and proposes an efficient and reliable solution, which has important practical application value.