AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning

Pengxiang Wang,Hongbo Bo,Jun Hong,Weiru Liu,Kedian Mu
DOI: https://doi.org/10.1007/978-3-031-70352-2_9
2024-01-01
Abstract:Catastrophic forgetting is a major issue in task-incremental learning, where a neural network loses what it has learned in previous tasks after being trained on new tasks. A number of architecture-based approaches have been proposed to address this issue. However, the architecture-based approaches suffer from another issue on network capacity when the network learns long sequences of tasks. As the network is trained on an increasing number of new tasks in a long sequence of tasks, more parameters become static to prevent the network from forgetting what it has learned in previous tasks. In this paper, we propose an adaptive task-based hard attention mechanism which allows adaptive updates to static parameters by taking into account the information about previous tasks on both the importance of these parameters to previous tasks and the current network capacity. We develop a new neural network architecture incorporating our proposed Adaptive Hard Attention to the Task (AdaHAT) mechanism. AdaHAT extends an existing architecture-based approach, Hard Attention to the Task (HAT), to learn long sequences of tasks in an incremental manner. We conduct experiments on a number of datasets and compare AdaHAT with a number of baselines, including HAT. Our experimental results show that AdaHAT achieves better average performance over tasks than these baselines, especially on long sequences of tasks, demonstrating the benefits from balancing the trade-off between stability and plasticity of a network when learning such sequences of tasks. Our code is available at github.com/pengxiang-wang/continual-learning-arena.
What problem does this paper attempt to address?