UniGrad-FS: Unified Gradient Projection with Flatter Sharpness for Continual Learning
Wei Li,Tao Feng,Hangjie Yuan,Ang Bian,Guodong Du,Sixin Liang,Jianhong Gan,Ziwei Liu
DOI: https://doi.org/10.1109/tii.2024.3435499
IF: 12.3
2024-01-01
IEEE Transactions on Industrial Informatics
Abstract:Continual learning (CL) desires that the neural network sequentially perform learning tasks from a dynamic data stream without forgetting learned knowledge. To overcome forgetting, a line of work relies on gradient projection to minimize the influence between gradients during optimization. This article focuses on a challenging problem concerning CL: When, how, and where to implement gradient projection to promote CL. Tackling this problem can be divided into two perspectives, namely the gradient direction ( when and how ) and the area of gradient conflict ( where ). First, we propose a plug-and-play method UniGrad to tackle the inconsistency of conflicting and nonconflicting gradients during optimization in CL. Second, we explore the interaction mechanism of gradient projection and loss landscape in CL, and further propose a pluggable method UniGrad-FS to improve the CL performance. In short, this work expects to overcome forgetting through an efficient gradient projection at the area where the gradient conflicts are less intense. In essence, the proposed method is a general and pluggable method that can be used in any gradient-based optimizer. For evaluation, we plug UniGrad and UniGrad-FS into two top-performing baselines (WA and MEMO). Our method shows clear improvements, i.e., boosting WA and MEMO by +2.09% and 1.72% in the 20-step of the CIFAR100 benchmark. In addition, we observe performance enhancement on all settings of CIFAR100 and Tiny-ImageNet datasets. Extensive experiments demonstrate the simplicity and effectiveness of the proposed method.