Cooperative Q-Learning Based On Maturity Of The Policy

Mao Yang,Yantao Tian,Xiaomei Liu
DOI: https://doi.org/10.1109/ICMA.2009.5246732
2009-01-01
Abstract:In order to improve the convergence speed of reinforcement learning and avoid the local optimum for multi-robot systems, a new method of cooperative Q-learning based on maturity of the policy is presented. The learning process is executed at the blackboard architecture making use of all the robots in the training scenario to explore the learning space and collect experiences. The reinforcement learning algorithm was divided into two types: constant credit-degree and variable credit-degree, which the particle swarm optimize algorithm (PSO) is adopted to find the optimum for the constant credit-factor. The method is used to the task for fire-disaster response. Simulation experiments verify the effectiveness of the proposed algorithm.
What problem does this paper attempt to address?