Lifelong Reinforcement Learning via Neuromodulation

Sebastian Lee,Samuel Liebana Garcia,Claudia Clopath,Will Dabney
2024-08-16
Abstract:Navigating multiple tasks$\unicode{x2014}$for instance in succession as in continual or lifelong learning, or in distributions as in meta or multi-task learning$\unicode{x2014}$requires some notion of adaptation. Evolution over timescales of millennia has imbued humans and other animals with highly effective adaptive learning and decision-making strategies. Central to these functions are so-called neuromodulatory systems. In this work we introduce an abstract framework for integrating theories and evidence from neuroscience and the cognitive sciences into the design of adaptive artificial reinforcement learning algorithms. We give a concrete instance of this framework built on literature surrounding the neuromodulators Acetylcholine (ACh) and Noradrenaline (NA), and empirically validate the effectiveness of the resulting adaptive algorithm in a non-stationary multi-armed bandit problem. We conclude with a theory-based experiment proposal providing an avenue to link our framework back to efforts in experimental neuroscience.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is adaptive reinforcement learning (Reinforcement Learning, RL) in a multi - task environment. Specifically, the paper focuses on how to enable RL algorithms to effectively perform adaptive learning and decision - making when faced with continuous tasks or non - stationary multi - armed bandit problems. To achieve this goal, the author draws inspiration from neuroscience and cognitive science, especially using the role of neuromodulatory systems to design a new adaptive RL algorithm framework. These neuromodulatory systems play a crucial role in the learning and decision - making processes of animals, especially in exploring new environments and adapting to changes. ### Main contributions of the paper include: 1. **Propose a framework**: This framework is based on the current understanding of neuromodulatory systems in the mammalian brain and aims to improve the adaptability and exploration ability of artificial RL algorithms. 2. **Instantiate the framework**: It shows through specific examples how to apply the framework to common RL hyper - parameters, such as learning rate, exploration strategy, etc. 3. **Empirical evaluation**: The effectiveness of the framework instance is verified in non - stationary multi - armed bandit tasks. 4. **Experimental design suggestions**: A theoretically - based experimental design is proposed to link the framework with experimental neuroscience research, thereby promoting cross - research in the two fields. ### Specific components of the framework: - **Component I**: Establish hypotheses on how specific neuromodulatory systems affect the learning and behavior of animals and how these effects are mapped to certain hyper - parameters or metrics in RL algorithms. - **Component II**: Explore what is signaled or measured by neuromodulatory systems, such as dopamine signaling reward prediction error (RPE), noradrenaline (Noradrenaline, NA) signaling unexpected uncertainty, etc. - **Component III**: Identify and measure quantities similar to neuromodulatory system signals during the interaction between RL agents and the environment. - **Component IV**: Construct functions to convert these measured values into valid values of hyper - parameters, thereby achieving a complete mapping. ### Example application: The paper introduces a specific example - the Doya - DaYu agent. This agent uses the following mappings: - **Learning rate**: Related to acetylcholine (Acetylcholine, ACh), reflecting uncertainty balance. - **Inverse temperature**: Related to noradrenaline (NA), reflecting unexpected uncertainty. ### Experimental verification: In non - stationary multi - armed bandit tasks, the Doya - DaYu agent performs better than the traditional Discounted - UCB and Boltzmann strategies, especially in cases where the environment changes frequently, and its adaptability and performance are more superior. ### Feedback to neuroscience experiments: The paper also proposes how to use this framework to guide the design and analysis of neuroscience experiments, including exploratory and confirmatory experimental branches, to further verify and enrich the theoretical basis of the framework. In summary, this paper proposes a new adaptive RL algorithm framework by combining the latest research results in neuroscience and machine learning, and proves its effectiveness and potential through experiments.