FTT-NAS: Discovering Fault-Tolerant Convolutional Neural Architecture

Xuefei Ning,Guangjun Ge,Wenshuo Li,Zhenhua Zhu,Yin Zheng,Xiaoming Chen,Zhen Gao,Yu Wang,Huazhong Yang
DOI: https://doi.org/10.48550/arXiv.2003.10375
2021-04-13
Abstract:With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, and malicious attackers. Thus the safety risk of deploying NNs is now drawing much attention. In this paper, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-20). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models.
Signal Processing,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is various failure problems faced when deploying neural networks on edge devices. With the rapid development of embedded deep - learning computing systems, deep - learning - based applications are migrating from the cloud to the edge. However, when deploying neural networks in complex environments, multiple types of failures may be encountered, such as soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature changes, and malicious attackers. These failures pose a threat to the security of neural networks. For this reason, by analyzing the possible failures in different types of neural network accelerators, the paper formalizes and implements multiple failure models from an algorithmic perspective. The author proposes Fault - Tolerant Neural Architecture Search (FT - NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable for various failures in current devices, and combines Fault - Tolerant Training (FTT) in the search process, called FTT - NAS. The experimental results show that the discovered architectures significantly outperform manually - designed baseline architectures while maintaining or reducing the number of floating - point operations (FLOPs) and parameters. Specifically, under the same failure settings, the F - FTT - Net discovered under the feature failure model achieves an accuracy of 86.2% (while the accuracy of MobileNet - V2 is 68.1%), and the W - FTT - Net discovered under the weight failure model achieves an accuracy of 69.6% (while the accuracy of ResNet - 18 is 60.8%). By examining the discovered architectures, the author finds that factors such as operation primitives, weight quantization ranges, model capacity, and connection patterns have an impact on the fault - tolerance ability of neural network models.