Elephant Neural Networks: Born to Be a Continual Learner

Qingfeng Lan,A. Rupam Mahmood

2023-10-03

Abstract:Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieves excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address the long-standing issue of catastrophic forgetting in continual learning. Specifically, it explores how the characteristics of neural network architectures influence catastrophic forgetting and proposes a new activation function—the elephant activation function—to mitigate this phenomenon. The paper points out that although some existing methods effectively alleviate catastrophic forgetting at the algorithmic level, there is still insufficient understanding of how neural network architecture characteristics lead to forgetting. Therefore, the paper fills this gap by studying the role of activation functions in the dynamics of neural network training and their impact on catastrophic forgetting. The main contributions of the paper include: 1. **Revealing the importance of gradient sparsity in activation functions**: In addition to sparse representations, the gradient sparsity of activation functions is also a key factor in reducing catastrophic forgetting. 2. **Proposing the elephant activation function**: This new activation function can generate sparse representations and sparse gradients, thereby significantly enhancing the resistance of neural networks to catastrophic forgetting. 3. **Validating the effectiveness of the method**: Through experiments on regression tasks, class-incremental learning tasks, and reinforcement learning tasks, it is demonstrated that neural networks using the elephant activation function (referred to as Elephant Neural Networks, ENNs) can significantly improve performance with a single pass through the dataset and without using replay buffers, task boundary information, or pre-training. In summary, the paper aims to reduce catastrophic forgetting by improving the architectural characteristics of neural networks, particularly in continual learning scenarios.

Elephant Neural Networks: Born to Be a Continual Learner

Progressive Learning without Forgetting

Overcoming Long-Term Catastrophic Forgetting Through Adversarial Neural Pruning and Synaptic Consolidation

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Overcoming catastrophic forgetting in neural networks

Measuring Catastrophic Forgetting in Neural Networks

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning

Memory Recall: A Simple Neural Network Training Framework Against Catastrophic Forgetting

Wide Neural Networks Forget Less Catastrophically

A Methodology-Oriented Study of Catastrophic Forgetting in Incremental Deep Neural Networks

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

Overcoming Catastrophic Forgetting by XAI

Triple-Memory Networks: A Brain-Inspired Method for Continual Learning

Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping

SNAP: Stopping Catastrophic Forgetting in Hebbian Learning with Sigmoidal Neuronal Adaptive Plasticity

Catastrophic Forgetting in Deep Learning: A Comprehensive Taxonomy

Explaining How Deep Neural Networks Forget by Deep Visualization

A Neural Network Model of Continual Learning with Cognitive Control

SynapNet: A Complementary Learning System Inspired Algorithm With Real-Time Application in Multimodal Perception