Training Reinforcement Neurocontrollers Using the Polytope Algorithm

Aristidis Likas,Isaac E. Lagaris
DOI: https://doi.org/10.1023/a:1018669223478
IF: 2.565
1999-01-01
Neural Processing Letters
Abstract:A new training algorithm is presented for delayed reinforcement learning problems that does not assume the existence of a critic model and employs the polytope optimization algorithm to adjust the weights of the action network so that a simple direct measure of the training performance is maximized. Experimental results from the application of the method to the pole balancing problem indicate improved training performance compared with critic-based and genetic reinforcement approaches.
computer science, artificial intelligence
What problem does this paper attempt to address?