Abstract:Animals often demonstrate a remarkable ability to adapt to their environments during their lifetime. They do so partly due to the evolution of morphological and neural structures. These structures capture features of environments shared between generations to bias and speed up lifetime learning. In this work, we propose a computational model for studying a mechanism that can enable such a process. We adopt a computational framework based on meta reinforcement learning as a model of the interplay between evolution and development. At the evolutionary scale, we evolve reservoirs, a family of recurrent neural networks that differ from conventional networks in that one optimizes not the synaptic weights, but hyperparameters controlling macro-level properties of the resulting network architecture. At the developmental scale, we employ these evolved reservoirs to facilitate the learning of a behavioral policy through Reinforcement Learning (RL). Within an RL agent, a reservoir encodes the environment state before providing it to an action policy. We evaluate our approach on several 2D and 3D simulated environments. Our results show that the evolution of reservoirs can improve the learning of diverse challenging tasks. We study in particular three hypotheses: the use of an architecture combining reservoirs and reinforcement learning could enable (1) solving tasks with partial observability, (2) generating oscillatory dynamics that facilitate the learning of locomotion tasks, and (3) facilitating the generalization of learned behaviors to new tasks unknown during the evolution phase.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to explore how to optimize neural structures on an evolutionary scale through the interaction between evolution and development, thereby enhancing the ability of agents to learn complex tasks on a developmental scale. Specifically, the author proposes a computational model named "Evolving Reservoirs for Meta Reinforcement Learning (ER - MRL)" to study the following three main hypotheses: 1. **Solving partially observable tasks**: - In a partially observable environment, agents cannot obtain all the necessary information to solve problems. The author hypothesizes that the reservoir generated through evolution can use its internal recursive dynamics to reconstruct the missing information, thus helping agents solve these tasks. 2. **Generating oscillatory dynamics that are helpful for learning motor tasks**: - The author hypothesizes that the reservoir generated through evolution can produce oscillatory patterns similar to central pattern generators (CPGs), helping to coordinate body movements and thus facilitating the learning of motor tasks. CPGs are a type of neural network responsible for generating complex periodic movement patterns, such as walking, swimming, etc. 3. **Promoting the generalization ability to new tasks**: - The author hypothesizes that the reservoir generated through evolution can capture the abstract characteristics in different environments, so that agents have better generalization ability when facing new tasks that they have not seen before. ### Research methods To verify the above hypotheses, the author designed a framework containing two nested optimization loops: - **Outer loop (evolutionary scale)**: Use an evolutionary algorithm to optimize the hyper - parameters (HPs) of the generated reservoir to maximize the learning performance of agents in the inner loop. - **Inner loop (developmental scale)**: In a simulated environment, agents use the generated reservoir as input and learn behavior policies through reinforcement learning (RL) to maximize the cumulative reward. ### Experimental results The author verified these three hypotheses through a series of experiments: 1. **Partially observable tasks**: - In a partially observable environment, the performance of ER - MRL agents is close to that of RL agents in a fully observable environment, indicating that the reservoir generated through evolution can indeed help reconstruct the missing information. 2. **Motor tasks**: - In some motor tasks (such as Ant, HalfCheetah, and Swimmer), ER - MRL agents show a significant performance improvement in the early learning stage, supporting the hypothesis that the reservoir may generate beneficial oscillatory patterns. 3. **Generalization ability to new tasks**: - ER - MRL agents show good generalization ability in new tasks that they have not seen before, especially in environments with different morphologies, further verifying that the reservoir generated through evolution can encode diverse dynamic characteristics. Overall, this paper proposes a new computational model by combining evolutionary algorithms, reservoir computing, and meta - reinforcement learning, showing how to enhance the learning and adaptation ability of agents through evolutionary optimization of neural structures.

Evolving Reservoirs for Meta Reinforcement Learning

Towards continual reinforcement learning through evolutionary meta-learning

Curriculum Reinforcement Learning via Morphology-Environment Co-Evolution

Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL

Evolving Models for Incrementally Learning Emerging Activities

Breaching the Bottleneck: Evolutionary Transition from Reward-Driven Learning to Reward-Agnostic Domain-Adapted Learning in Neuromodulated Neural Nets

Lifelong Reinforcement Learning via Neuromodulation

Evolutionary Reinforcement Learning via Cooperative Coevolution

Reinforcement Learning with Brain-Inspired Modulation can Improve Adaptation to Environmental Changes

Evolutionary Reinforcement Learning: A Survey

Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

A meta reinforcement learning account of behavioral adaptation to volatility in recurrent neural networks

Emergent Solutions to High-Dimensional Multitask Reinforcement Learning

Evolving Curricula with Regret-Based Environment Design

Learning to acquire novel cognitive tasks with evolution, plasticity and meta-meta-learning

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

Neuroevolution of Recurrent Architectures on Control Tasks

Meta-Learning an Evolvable Developmental Encoding

Multitask Neuroevolution for Reinforcement Learning with Long and Short Episodes

Context meta-reinforcement learning via neuromodulation