Abstract:Reinforcement Learning,as a subject of study for over more than fifty years,investigates how an autonomous agent can learn what to do to maximize a numerical reward signal from interaction with the world by balancing exploration of the environment with exploitation of knowledge gained via evaluative feedback,without relying on exemplary supervision of an omniscient teacher or complete models of the environment.Deep learning is a cutting-edge approach to machine learning that concerns with using multi-layer artificial neural networks to learn the complicated representations that are expressed in terms of simpler ones.Currently,Deep Reinforcement Learning formed by combining modern reinforcement learning with deep learning is becoming a new research hotspot in the Artificial Intelligence community,and has made substantial breakthroughs in a variety of tasks-such as robot control,text recognition and games-requiring both rich perception of high-dimensional raw inputs and policy control.In particular,a state-of-the-art deep reinforcement learning model,termed Deep Q-Network,is able to perform human-level control using the same network architecture and hyper-parameters for handling problems approaching real-world complexity such as some Atari 2600 games.However,Deep Q-Network''s performance falls far below human level in situations that exist delayed rewards and require planning under uncertainty within long-time horizon to optimize strategies.This implies that Deep Q-Network is not good at controlling agents in strategic deep reinforcement learning tasks.To alleviate the issue,this paper proposes a novel deep reinforcement learning model by improving Deep Q-Network with recurrent neural networks based on visual attention mechanism.Two key ideas are included in the new model: (1) it uses recurrent neural networks consisting of two-layer gated recurrent units in order to remember more historical information of multiple time steps.This can make an agent exploit delayed feedback in time to guide its next action selection online.By using recurrent neural networks,the scale of input state is reduced from four stacked images to one current raw image.This can substantially reduce the state space;(2) the visual attention mechanism is used to adaptively focus attention on smaller but more valuable regions of an input image,and make agents control the process of learning near optimal policies more effectively.As a result,the number of parameters to be learned by a stochastic gradient descent method during training can be decreased sharply by introducing the visual attention mechanism.This can speed up the process of learning near optimal policies.This new model is actually equivalent to an encoder-decoder architecture,where the convolutional neural networks play an encoder role for extracting useful features,and the recurrent neural networks based on visual attention mechanism play the other.We used five challenging strategic tasks from the set of classic Atari 2600 games,i.e.,Seaquest,Alien,Gopher,Asteroids,and Gravitar,to verify the effectiveness of the new model.Experimental results show that artificial agents generated through our new model surpass DQN and its variant''s performance in terms of the average reward per episode,training speed,and policy stability on them,especially on the Seaquest and Gopher games.

Curiosity-Driven Variational Autoencoder for Deep Q Network

The Dreaming Variational Autoencoder for Reinforcement Learning Environments

DQN with model-based exploration: efficient learning on environments with sparse rewards

Convergent and Efficient Deep Q Network Algorithm

Random curiosity-driven exploration in deep reinforcement learning

Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders

Human-Level Control Through Directly-Trained Deep Spiking Q-Networks

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Adaptive Disassembly Sequence Planning for VR Maintenance Training Via Deep Reinforcement Learning

A Deep Recurrent Q-Network Based on Visual Attention Mechanism

Langevin DQN

Fixed $β$-VAE Encoding for Curious Exploration in Complex 3D Environments

Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments

DAQN: Deep Auto-encoder and Q-Network

A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle Avoidance

Autonomous Maneuver Decision of UCAV Air Combat Based on Double Deep Q Network Algorithm and Stochastic Game Theory

Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors

Deep Variational Autoencoder for Mapping Functional Brain Networks

Handling Large-Scale Action Space In Deep Q Network

Self-evolving Autoencoder Embedded Q-Network

A Deep Reinforcement Learning Based Intelligent Decision Method for UCAV Air Combat