The Research on Adaptive Reinforcement Learning Technique Based on Convex Polyhedra Abstraction Domain
Dong-Huo CHEN,Quan LIU,Fei ZHU,Hai-Dong JIN
DOI: https://doi.org/10.11897/SP.J.1016.2018.00112
2018-01-01
Chinese Journal of Computers
Abstract:The table-driven based algorithm is an important method for solving the reinforcement learning problems,but for the real world problems with continuous state spaces,the method is challenged by the curse of dimensionality,also named as the state explosion problem.Two methods have been presented for attacking the curse of dimensionality,including discretization of continuous state space and function approximation.For the usage of the discretization of continuous state space,the table-driven based algorithm is of some advantages than the function approximation based algorithms,namely straightforward principle,the implementation with concise data structure and the lightweight computation.Note that the core of algorithm is to discover a qualified discretization mechanism with which the computation cost and the accuracy of the abstract model are well balanced,and the optimal policy of an original reinforcement learning problem can be approximately derived according to its abstract state space and quantitative reward metrics.This paper presents an adaptive discretization technique based on the convex polyhedra abstraction domain,and designs an adaptive polyhedra domain based Q(λ) algorithm (APDQ(λ)) on the basis of Q(λ),an important algorithm in reinforcement learning.Convex polyhedron is a qualified representation of abstract state,which is widely in performance evaluation of complex stochastic systems and verification of numerical properties of programs.The method abstracts a continuous (infinite) concrete state space into a discrete and manageable set of abstract states by defining an abstract function,such that the control problem of the original system can be resolved directly by the corresponding abstract system.Especially,some adaptive refinement operators,such as BoxRefinement,LFRefinement and MVLFRefinement,are studied,which are dependent on the online samples information for a refined abstract polyhedron state.The abstract state space is dynamically adjusted,such that a finite and discrete model is statistically derived according to online samples,which approximates the dynamic and reward model of continuous Markov system.Finally,APDQ(λ) is implemented,in which,the involved algebraic and geometrical computations of polyhedra with the requirement of high precision are programmed by calling the APIs of Parma polyhedra library (PPL) and GNU multiple precision (GMP),and some case studies are conducted for showing the performance of APDQ(λ).In the experiments,using mountain car (MC) and acrobatic robot (Acrobot) with the continuous state spaces as the experimental subjects,the ability and the limitation of APDQ(λ) under different combinations of parameters values are probed in detail.The experimental results demonstrate that (1) APDQ(λ)behave well when γ≥0.7,just as shown in figures 6-13:The policy is rapidly improved in the initial phase and inclines to converge later on,moreover,it has fine adaptability to learning rate α and all sorts of parameters related to the refinement of abstract state space;(2) the performance of the algorithm degrades severely when γ≤0.6.Summarily,it is a novel idea on solving reinforcement learning problems with continuous state space that abstraction interpretation technique is applied to statistical learning process,and many topics deserve more attention,such as the sampling policy and the update of value functions in the context of abstract approximate model.