Hippocampus-Inspired Cognitive Architecture (HICA) for Operant Conditioning

Deokgun Park,Md Ashaduzzaman Rubel Mondol,SM Mazharul Islam,Aishwarya Pothula
DOI: https://doi.org/10.48550/arXiv.2212.08626
2022-12-17
Abstract:The neural implementation of operant conditioning with few trials is unclear. We propose a Hippocampus-Inspired Cognitive Architecture (HICA) as a neural mechanism for operant conditioning. HICA explains a learning mechanism in which agents can learn a new behavior policy in a few trials, as mammals do in operant conditioning experiments. HICA is composed of two different types of modules. One is a universal learning module type that represents a cortical column in the neocortex gray matter. The working principle is modeled as Modulated Heterarchical Prediction Memory (mHPM). In mHPM, each module learns to predict a succeeding input vector given the sequence of the input vectors from lower layers and the context vectors from higher layers. The prediction is fed into the lower layers as a context signal (top-down feedback signaling), and into the higher layers as an input signal (bottom-up feedforward signaling). Rewards modulate the learning rate in those modules to memorize meaningful sequences effectively. In mHPM, each module updates in a local and distributed way compared to conventional end-to-end learning with backpropagation of the single objective loss. This local structure enables the heterarchical network of modules. The second type is an innate, special-purpose module representing various organs of the brain's subcortical system. Modules modeling organs such as the amygdala, hippocampus, and reward center are pre-programmed to enable instinctive behaviors. The hippocampus plays the role of the simulator. It is an autoregressive prediction model of the top-most level signal with a loop structure of memory, while cortical columns are lower layers that provide detailed information to the simulation. The simulation becomes the basis for learning with few trials and the deliberate planning required for operant conditioning.
Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of how to achieve rapid learning capabilities similar to those of organisms in operant conditioning within reinforcement learning. Specifically, the authors point out that current artificial intelligence and machine learning methods, especially model-free reinforcement learning methods, are more akin to evolutionary processes, requiring a large number of training samples to learn new behavior strategies. In contrast, organisms can learn new behaviors with only a few trials and errors in operant conditioning experiments. Therefore, the goal of the paper is to design a Hippocampus-Inspired Cognitive Architecture (HICA) that enables agents to quickly learn new behavior strategies with a small number of trials, thereby better simulating the operant conditioning process of organisms. ### Main Issues 1. **Sample Efficiency Issue**: Current reinforcement learning methods require a large number of training samples (i.e., a large number of trials and errors) to learn effective behavior strategies, which is far from the performance of organisms in operant conditioning. 2. **Fragility of Behavior Strategies**: The behavior strategies learned by current methods fail when the environment changes slightly, lacking robustness. ### Solutions To address the above issues, the paper proposes the following two main solutions: 1. **Virtual Skinner Box test framework**: Used to evaluate the agent's ability to learn new behaviors with a small number of trials. 2. **Hippocampus-Inspired Cognitive Architecture (HICA)**: A neural mechanism that simulates the function of the hippocampus, aiming to achieve rapid learning and adaptation to new environments. ### Main Components of the Cognitive Architecture 1. **Modulated Heterarchical Prediction Memory (mHPM)**: Simulates the function of the neocortex gray matter, where each module learns by predicting the next input vector and provides feedback and feedforward through contextual signals. 2. **Innate Modules**: Simulates the function of subcortical structures (such as the thalamus, amygdala, hippocampus, and reward center), pre-programmed to achieve instinctive behaviors. 3. **Heterarchical Network**: A network connecting mHPM modules and innate modules, allowing information transfer and coordination between different modules. ### Key Mechanisms - **Prediction and Feedback**: mHPM modules learn by predicting the next input vector and use the prediction results as contextual signals to provide feedback to lower-level modules while feeding forward input signals to higher-level modules. - **Reward Modulation**: Reward signals modulate the learning rate of mHPM modules, enabling the agent to remember meaningful sequences more quickly. - **Implementation of Innate Modules**: Specific instinctive behaviors, such as reflexes, basal ganglia, amygdala, and reward systems, are implemented through manual programming. Through these mechanisms, HICA aims to enable agents to quickly learn new behaviors with a small number of trials, thereby better simulating the operant conditioning process of organisms.