Where to model the epistemic uncertainty of Bayesian convolutional neural networks for classification

Yuan Tai,Yihua Tan,Erbo Zou,Bo Lei,Qiang Fan,Yizhou He
DOI: https://doi.org/10.1016/j.neucom.2024.127568
IF: 6
2024-03-26
Neurocomputing
Abstract:Modeling the epistemic uncertainty, Bayesian convolutional neural networks (Bayesian-CNNs) place a probability distribution on the kernel of convolutional neural networks (CNNs) to overcome problems of over-fitting and incapability of quantifying the prediction confidence. However, compared with CNNs with point estimates, Bayesian-CNNs have a limited improvement over classification performance. Some experiments show that the closer to the input layer, the larger the variances of the parameters in those layers of Bayesian-CNNs, which means the parameters of the low-level layers are not well-trained. Therefore, it is possibly inappropriate to model the epistemic uncertainty of the parameters in the layers close to the input. In this paper, the question of "where to model the epistemic uncertainty of Bayesian-CNNs" is studied by analyzing the performances by putting randomness over different convolutional groups. For this unique structure by combining certain and uncertain layers, a novel objective function is proposed which consists of KL loss for the uncertain parameters and cross-entropy loss for certain parameters. For the special objective function, a training scheme that updates these two kinds of parameters alternately is proposed. Partial-Bayesian-CNNs (P-Bayesian-CNNs) that model the epistemic uncertainty over the parameters of the last convolutional group are recommended because they bring the highest classification accuracy gain according to the experimental analysis. Compared to the traditional Bayesian-CNNs, P-Bayesian-CNNs increase the accuracy by 3.8% and 3.7% on CIFAR-10 and RAF-DB respectively. Besides, the experimental results by taking Res2Net and Xception as backbones also show performance improvement, which verifies the scalability of the P-Bayesian-CNNs.
computer science, artificial intelligence
What problem does this paper attempt to address?