MindFI: A Fault Injection Tool for Reliability Assessment of MindSpore Applicacions

Yang Zheng,Zhenye Feng,Zheng Hu,Ke Pei
DOI: https://doi.org/10.1109/issrew53611.2021.00068
2021-01-01
Abstract:With the emergence of big data and remarkable improvement of computational power, deep neural network (DNN) based intelligent systems, with the superb performance on computer vision, nature language processing, and optimization processing, etc, has been acceleratingly replacing traditional software in various aspects. However, due to the uncertainty of DNN modules learned from data, the intelligent systems are more likely to exhibit incorrect behaviors. Faults in software and hardware are also inevitably in practice, where the hidden defects can easily cause model failure. These will lead to severe accidents and losses in safety- and reliability-critical scenarios, such as autonomous driving. Techniques to test the differences between actual and desired behaviors and evaluate the reliability of DNN applications at faulty conditions is therefore significant for building a trustworthy DNN system. A popular method is fault injection and various fault injection tools have been developed for ML frameworks, such as Tensorflow, PyTorch. In this paper, we present a tool, MindFI, which targets to cover a variety of faults in ML programs written in Mindspore. Data, software and hardware faults can be easily injected in general Mindspore programs. We also use MindFI to evaluate the resilience of several commonly used ML programs against a assessment metrics.
What problem does this paper attempt to address?