Abstract:Navigating multiple tasks$\unicode{x2014}$for instance in succession as in continual or lifelong learning, or in distributions as in meta or multi-task learning$\unicode{x2014}$requires some notion of adaptation. Evolution over timescales of millennia has imbued humans and other animals with highly effective adaptive learning and decision-making strategies. Central to these functions are so-called neuromodulatory systems. In this work we introduce an abstract framework for integrating theories and evidence from neuroscience and the cognitive sciences into the design of adaptive artificial reinforcement learning algorithms. We give a concrete instance of this framework built on literature surrounding the neuromodulators Acetylcholine (ACh) and Noradrenaline (NA), and empirically validate the effectiveness of the resulting adaptive algorithm in a non-stationary multi-armed bandit problem. We conclude with a theory-based experiment proposal providing an avenue to link our framework back to efforts in experimental neuroscience.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is adaptive reinforcement learning (Reinforcement Learning, RL) in a multi - task environment. Specifically, the paper focuses on how to enable RL algorithms to effectively perform adaptive learning and decision - making when faced with continuous tasks or non - stationary multi - armed bandit problems. To achieve this goal, the author draws inspiration from neuroscience and cognitive science, especially using the role of neuromodulatory systems to design a new adaptive RL algorithm framework. These neuromodulatory systems play a crucial role in the learning and decision - making processes of animals, especially in exploring new environments and adapting to changes. ### Main contributions of the paper include: 1. **Propose a framework**: This framework is based on the current understanding of neuromodulatory systems in the mammalian brain and aims to improve the adaptability and exploration ability of artificial RL algorithms. 2. **Instantiate the framework**: It shows through specific examples how to apply the framework to common RL hyper - parameters, such as learning rate, exploration strategy, etc. 3. **Empirical evaluation**: The effectiveness of the framework instance is verified in non - stationary multi - armed bandit tasks. 4. **Experimental design suggestions**: A theoretically - based experimental design is proposed to link the framework with experimental neuroscience research, thereby promoting cross - research in the two fields. ### Specific components of the framework: - **Component I**: Establish hypotheses on how specific neuromodulatory systems affect the learning and behavior of animals and how these effects are mapped to certain hyper - parameters or metrics in RL algorithms. - **Component II**: Explore what is signaled or measured by neuromodulatory systems, such as dopamine signaling reward prediction error (RPE), noradrenaline (Noradrenaline, NA) signaling unexpected uncertainty, etc. - **Component III**: Identify and measure quantities similar to neuromodulatory system signals during the interaction between RL agents and the environment. - **Component IV**: Construct functions to convert these measured values into valid values of hyper - parameters, thereby achieving a complete mapping. ### Example application: The paper introduces a specific example - the Doya - DaYu agent. This agent uses the following mappings: - **Learning rate**: Related to acetylcholine (Acetylcholine, ACh), reflecting uncertainty balance. - **Inverse temperature**: Related to noradrenaline (NA), reflecting unexpected uncertainty. ### Experimental verification: In non - stationary multi - armed bandit tasks, the Doya - DaYu agent performs better than the traditional Discounted - UCB and Boltzmann strategies, especially in cases where the environment changes frequently, and its adaptability and performance are more superior. ### Feedback to neuroscience experiments: The paper also proposes how to use this framework to guide the design and analysis of neuroscience experiments, including exploratory and confirmatory experimental branches, to further verify and enrich the theoretical basis of the framework. In summary, this paper proposes a new adaptive RL algorithm framework by combining the latest research results in neuroscience and machine learning, and proves its effectiveness and potential through experiments.

Lifelong Reinforcement Learning via Neuromodulation

Uncertainty-based modulation for lifelong learning

Reinforcement Learning with Brain-Inspired Modulation can Improve Adaptation to Environmental Changes

Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets

Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

Modular Continual Learning in a Unified Visual Environment

System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

Meta-Learning Strategies through Value Maximization in Neural Networks

Evolving Reservoirs for Meta Reinforcement Learning

A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment

Lifelong Reinforcement Learning with Modulating Masks

Context meta-reinforcement learning via neuromodulation

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Learning to acquire novel cognitive tasks with evolution, plasticity and meta-meta-learning

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Incorporating neuro-inspired adaptability for continual learning in artificial intelligence

Neuroevolution of Recurrent Architectures on Control Tasks

Learning to Modulate Random Weights: Neuromodulation-inspired Neural Networks For Efficient Continual Learning