Abstract:The neural implementation of operant conditioning with few trials is unclear. We propose a Hippocampus-Inspired Cognitive Architecture (HICA) as a neural mechanism for operant conditioning. HICA explains a learning mechanism in which agents can learn a new behavior policy in a few trials, as mammals do in operant conditioning experiments. HICA is composed of two different types of modules. One is a universal learning module type that represents a cortical column in the neocortex gray matter. The working principle is modeled as Modulated Heterarchical Prediction Memory (mHPM). In mHPM, each module learns to predict a succeeding input vector given the sequence of the input vectors from lower layers and the context vectors from higher layers. The prediction is fed into the lower layers as a context signal (top-down feedback signaling), and into the higher layers as an input signal (bottom-up feedforward signaling). Rewards modulate the learning rate in those modules to memorize meaningful sequences effectively. In mHPM, each module updates in a local and distributed way compared to conventional end-to-end learning with backpropagation of the single objective loss. This local structure enables the heterarchical network of modules. The second type is an innate, special-purpose module representing various organs of the brain's subcortical system. Modules modeling organs such as the amygdala, hippocampus, and reward center are pre-programmed to enable instinctive behaviors. The hippocampus plays the role of the simulator. It is an autoregressive prediction model of the top-most level signal with a loop structure of memory, while cortical columns are lower layers that provide detailed information to the simulation. The simulation becomes the basis for learning with few trials and the deliberate planning required for operant conditioning.

What problem does this paper attempt to address?

The paper attempts to address the problem of how to achieve rapid learning capabilities similar to those of organisms in operant conditioning within reinforcement learning. Specifically, the authors point out that current artificial intelligence and machine learning methods, especially model-free reinforcement learning methods, are more akin to evolutionary processes, requiring a large number of training samples to learn new behavior strategies. In contrast, organisms can learn new behaviors with only a few trials and errors in operant conditioning experiments. Therefore, the goal of the paper is to design a Hippocampus-Inspired Cognitive Architecture (HICA) that enables agents to quickly learn new behavior strategies with a small number of trials, thereby better simulating the operant conditioning process of organisms. ### Main Issues 1. **Sample Efficiency Issue**: Current reinforcement learning methods require a large number of training samples (i.e., a large number of trials and errors) to learn effective behavior strategies, which is far from the performance of organisms in operant conditioning. 2. **Fragility of Behavior Strategies**: The behavior strategies learned by current methods fail when the environment changes slightly, lacking robustness. ### Solutions To address the above issues, the paper proposes the following two main solutions: 1. **Virtual Skinner Box test framework**: Used to evaluate the agent's ability to learn new behaviors with a small number of trials. 2. **Hippocampus-Inspired Cognitive Architecture (HICA)**: A neural mechanism that simulates the function of the hippocampus, aiming to achieve rapid learning and adaptation to new environments. ### Main Components of the Cognitive Architecture 1. **Modulated Heterarchical Prediction Memory (mHPM)**: Simulates the function of the neocortex gray matter, where each module learns by predicting the next input vector and provides feedback and feedforward through contextual signals. 2. **Innate Modules**: Simulates the function of subcortical structures (such as the thalamus, amygdala, hippocampus, and reward center), pre-programmed to achieve instinctive behaviors. 3. **Heterarchical Network**: A network connecting mHPM modules and innate modules, allowing information transfer and coordination between different modules. ### Key Mechanisms - **Prediction and Feedback**: mHPM modules learn by predicting the next input vector and use the prediction results as contextual signals to provide feedback to lower-level modules while feeding forward input signals to higher-level modules. - **Reward Modulation**: Reward signals modulate the learning rate of mHPM modules, enabling the agent to remember meaningful sequences more quickly. - **Implementation of Innate Modules**: Specific instinctive behaviors, such as reflexes, basal ganglia, amygdala, and reward systems, are implemented through manual programming. Through these mechanisms, HICA aims to enable agents to quickly learn new behaviors with a small number of trials, thereby better simulating the operant conditioning process of organisms.

Hippocampus-Inspired Cognitive Architecture (HICA) for Operant Conditioning

Brain Inspired Cognitive System for Learning and Memory

A Neuro-Cognitive System and Its Application in Robotics

A theory of cerebral learning regulated by the reward system. I. Hypotheses and mathematical description

Vision Enhanced Neuro-Cognitive Structure for Robotic Spatial Cognition

Hippocampus shapes cortical sensory output and novelty coding through a direct feedback circuit

AHA! an 'Artificial Hippocampal Algorithm' for Episodic Machine Learning

Predictive Coding of Reward in the Hippocampus

Associative Memory Model of Hippocampus CA3 Using Spike Response Neurons

Neuroplasticity Meets Artificial Intelligence: A Hippocampus-Inspired Approach to the Stability–Plasticity Dilemma

Latent representations in hippocampal network model co-evolve with behavioral exploration of task structure

An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction

A non-memory-based functional neural framework for animal caching behavior

Bio-inspired computational memory model of the Hippocampus: An approach to a neuromorphic spike-based Content-Addressable Memory

Modeling flexible behaviour by the interactions between hippocampus and cortex

Hippocampal representations emerge when training recurrent neural networks on a memory dependent maze navigation task

A hippocampo-cortical pathway detects changes in the validity of an action as a predictor of reward

Multiple Memory Stores and Operant Conditioning: A Rationale for Memory’s Complexity

Partially dissociable roles of the Orbitofrontal cortex and dorsal Hippocampus in context-dependent (hierarchical) reward predictions and contextual inference in learning

A model of how hierarchical representations constructed in the hippocampus are used to navigate through space