A Phased Dyna Reinforcement Learning Algorithm

赵昀,陈庆伟,胡维礼
DOI: https://doi.org/10.3969/j.issn.1006-9348.2009.07.039
2009-01-01
Abstract:For rational allocation of computation resource on planning and learning in existing Dyna reinforcement learning architecture,this paper presents a phased Dyna architecture.With the accumulation of experiences,it partitioned whole learning process into exploration,variable proportional learning and optimization phases to controll planning and learning correspondingly,which reduced the waste of computation resource greatly.Combining with traditional Q-learning algorithm,the phased Dyna-Q reinforcement learning algorithm was studied for adapting to dynamic and uncertain environment.Simulation results in a reinforcement learning benchmark problem indicate the efficiency of presented architecture.
What problem does this paper attempt to address?