What problem does this paper attempt to address?

The problem that this paper attempts to solve is: when using TensorFlow to train the ResNet model for image classification, the impact of GPU non - determinism on model performance evaluation. Specifically, the author studied the impact of GPU non - determinism on the standard deviation of test - set accuracy and loss, and explored whether this non - determinism is the main cause of model result variation. ### Research Background and Problem Description 1. **Importance of Repetitive Experiments** - In deep - learning research, researchers usually need to run the model multiple times to understand performance variation. - Repetitive experiments usually use different random seeds to initialize weights and generate minibatches. 2. **Sources of GPU Non - determinism** - GPU non - determinism stems from different orders of floating - point operations, which can lead to different results even on the same system, in the same software environment, and in the same operation mode. - For example, when calculating floating - point numbers, different compilers and architectures may add numbers in different orders, resulting in differences in results. 3. **Research Objectives** - The author hopes to isolate and quantify the impact of GPU non - determinism on model performance by fixing other random sources (such as initial weights and the order of minibatches). - Specifically, the author hopes to answer the following questions: - Is GPU non - determinism the main factor causing model result variation? - How large is the impact of this non - determinism on model performance evaluation? ### Experiment Design and Results 1. **Experimental Setup** - Use the ResNet - 50 model to conduct experiments on the CIFAR - 10 dataset. - Train for 200 epochs with a batch size of 32. - Fix random seeds to ensure that other random sources except GPU non - determinism are consistent. 2. **Experimental Results** - When using the same random seed, the standard deviation of test - set accuracy is \( \sigma(\text{accuracy}) = 1.995\times 10^{- 2}\), and the standard deviation of loss is \( \sigma(\text{loss}) = 3.020\times 10^{-3}\). - When using different random seeds, the standard deviation of test - set accuracy is \( \sigma(\text{accuracy}) = 2.699\times 10^{-2}\), and the standard deviation of loss is \( \sigma(\text{loss}) = 3.464\times 10^{-3}\). - Comparison results show that the proportion of variation caused by GPU non - determinism in the total variation is 74% (for accuracy) and 87% (for loss). ### Conclusions and Recommendations 1. **Conclusions** - Approximately 80% of the standard deviation of ResNet model accuracy is caused by GPU non - determinism, which is much higher than the impact of other random sources (such as initial weights and the order of minibatches). - This indicates that when evaluating deep - learning models, it may be insufficient to only compare a single accuracy value, because most of the variation comes from non - deterministic factors. 2. **Recommendations** - The author recommends that when evaluating new models, the distributions of the new model and the benchmark model should be compared, rather than just comparing a single accuracy value. - This recommendation is in line with the "Machine Learning Reproducibility Checklist" used in conferences such as NeurIPS, which requires researchers to provide error ranges and variation metrics. Through these studies, the author hopes to draw attention to the impact of GPU non - determinism and promote the development of more robust model evaluation methods.

Non-Determinism in TensorFlow ResNets

Stochastic Neural Networks with Infinite Width are Deterministic

Rolling the Dice for Better Deep Learning Performance: A Study of Randomness Techniques in Deep Neural Networks

On the Variance of Neural Network Training with respect to Test Sets and Distributions

Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications

Understanding Stochastic Optimization Behavior at the Layer Update Level (Student Abstract)

Investigating the Impact of Randomness on Reproducibility in Computer Vision: A Study on Applications in Civil Engineering and Medicine

Quantifying Inherent Randomness in Machine Learning Algorithms

Stochasticity in Neural ODEs: An Empirical Study

Probing the Structure and Functional Properties of the Dropout-Induced Correlated Variability in Convolutional Neural Networks

Inconsistency, Instability, and Generalization Gap of Deep Neural Network Training

Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient Descent

Measuring model variability using robust non-parametric testing

Training of deep residual networks with stochastic MG/OPT

Stochastic Gradient Descent and Anomaly of Variance-flatness Relation in Artificial Neural Networks

CUDA: Convolution-based Unlearnable Datasets

Deterministic equivalent and error universality of deep random features learning *

Neural Redshift: Random Networks are not Random Functions

Randomness Regularization with Simple Consistency Training for Neural Networks

Random ReLU Neural Networks as Non-Gaussian Processes