Continual learning, deep reinforcement learning, and microcircuits: a novel method for clever game playing

Oscar Chang,Leo Ramos,Manuel Eugenio Morocho-Cayamcela,Rolando Armas,Luis Zhinin-Vera
DOI: https://doi.org/10.1007/s11042-024-18925-2
IF: 2.577
2024-04-16
Multimedia Tools and Applications
Abstract:Contemporary neural networks frequently encounter the challenge of catastrophic forgetting, wherein newly acquired learning can overwrite and erase previously learned information. The paradigm of continual learning offers a promising solution by enabling intelligent systems to retain and build upon their acquired knowledge over time. This paper introduces a novel approach within the continual learning framework, employing deep reinforcement learning agents that process unprocessed pixel data and interact with microcircuit-like components. These agents autonomously advance through a series of learning stages, culminating in the development of a sophisticated neural network system optimized for predictive performance in the game of tic-tac-toe. Structured to operate in sequential order, each agent is tasked with achieving forward-looking objectives based on Bellman's principles of reinforcement learning. Knowledge retention is facilitated through the integration of specific microcircuits, which securely store the insights gained by each agent. During the training phase, these microcircuits work in concert, employing high-energy, sparse encoding techniques to enhance learning efficiency and effectiveness. The core contribution of this paper is the establishment of an artificial neural network system capable of accurately predicting tic-tac-toe moves, akin to the observational strategies employed by humans. Our experimental results demonstrate that after approximately 5000 cycles of backpropagation, the system significantly reduced the training loss to , thereby increasing the expected cumulative reward. This advancement in training efficiency translates into superior predictive capabilities, enabling the system to secure consistent victories by anticipating up to four moves ahead.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?