Elephant Neural Networks: Born to Be a Continual Learner

Qingfeng Lan,A. Rupam Mahmood
2023-10-03
Abstract:Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieves excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the long-standing issue of catastrophic forgetting in continual learning. Specifically, it explores how the characteristics of neural network architectures influence catastrophic forgetting and proposes a new activation function—the elephant activation function—to mitigate this phenomenon. The paper points out that although some existing methods effectively alleviate catastrophic forgetting at the algorithmic level, there is still insufficient understanding of how neural network architecture characteristics lead to forgetting. Therefore, the paper fills this gap by studying the role of activation functions in the dynamics of neural network training and their impact on catastrophic forgetting. The main contributions of the paper include: 1. **Revealing the importance of gradient sparsity in activation functions**: In addition to sparse representations, the gradient sparsity of activation functions is also a key factor in reducing catastrophic forgetting. 2. **Proposing the elephant activation function**: This new activation function can generate sparse representations and sparse gradients, thereby significantly enhancing the resistance of neural networks to catastrophic forgetting. 3. **Validating the effectiveness of the method**: Through experiments on regression tasks, class-incremental learning tasks, and reinforcement learning tasks, it is demonstrated that neural networks using the elephant activation function (referred to as Elephant Neural Networks, ENNs) can significantly improve performance with a single pass through the dataset and without using replay buffers, task boundary information, or pre-training. In summary, the paper aims to reduce catastrophic forgetting by improving the architectural characteristics of neural networks, particularly in continual learning scenarios.