Attentive Learning Facilitates Generalization of Neural Networks.

Shiye Lei,Fengxiang He,Haowen Chen,Dacheng Tao
DOI: https://doi.org/10.1109/tnnls.2024.3356310
IF: 14.255
2024-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:This article studies the generalization of neural networks (NNs) by examining how a network changes when trained on a training sample with or without out-of-distribution (OoD) examples. If the network’s predictions are less influenced by fitting OoD examples, then the network learns attentively from the clean training set. A new notion, dataset-distraction stability , is proposed to measure the influence. Extensive CIFAR-10/100 experiments on the different VGG, ResNet, WideResNet, ViT architectures, and optimizers show a negative correlation between the dataset-distraction stability and generalizability. With the distraction stability, we decompose the learning process on the training set $\mathcal{S}$ into multiple learning processes on the subsets of $\mathcal{S}$ drawn from simpler distributions, i.e., distributions of smaller intrinsic dimensions (IDs), and furthermore, a tighter generalization bound is derived. Through attentive learning, miraculous generalization in deep learning can be explained and novel algorithms can also be designed.
What problem does this paper attempt to address?