Reinforcement Learning with Monte Carlo Sampling in Imperfect Information Problems

Zhang Jiajia,Liu Hong
DOI: https://doi.org/10.1007/978-3-319-94307-7_5
2018-01-01
Abstract:Artificial intelligence is an approach that analyzes, studies, optimizes human strategies in challenging domains. Unlike perfect information problems, imperfect information problems usually present more complexity because the accuracy of conditions estimation cannot be effectively guaranteed. Thus, imperfect information problems need much more training data or much longer learning process when using supervised and unsupervised learning systems. This paper presents and evaluates a novel algorithm that based on Monte Carlo sampling as terminal states' estimation method in reinforce learning systems. The learning system calculates an adjusted result by novel algorithm in each iterations to smooth the fluctuation of imperfect information conditions. In this paper, we apply the new algorithm to build a deep neural network (DNN) learning system in our Texas Holdem poker game program. The contrast poker program has gained third rank in Annual Computer Poker Competition 2017 (ACPC 2017) and system with new approach shows better performance while convergence much faster.
What problem does this paper attempt to address?