Mastering Atari, Go, chess and shogi by planning with a learned model
Julian Schrittwieser,Ioannis Antonoglou,Thomas Hubert,Karen Simonyan,Laurent Sifre,Simon Schmitt,Arthur Guez,Edward Lockhart,Demis Hassabis,Thore Graepel,Timothy Lillicrap,David Silver
DOI: https://doi.org/10.1038/s41586-020-03051-4
IF: 64.8
2020-12-23
Nature
Abstract:Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess<sup><a href="/articles/s41586-020-03051-4#ref-CR1">1</a></sup> and Go<sup><a href="/articles/s41586-020-03051-4#ref-CR2">2</a></sup>, where a perfect simulator is available. However, in real-world problems, the dynamics governing the environment are often complex and unknown. Here we present the MuZero algorithm, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. The MuZero algorithm learns an iterable model that produces predictions relevant to planning: the action-selection policy, the value function and the reward. When evaluated on 57 different Atari games<sup><a href="/articles/s41586-020-03051-4#ref-CR3">3</a></sup>—the canonical video game environment for testing artificial intelligence techniques, in which model-based planning approaches have historically struggled<sup><a href="/articles/s41586-020-03051-4#ref-CR4">4</a></sup>—the MuZero algorithm achieved state-of-the-art performance. When evaluated on Go, chess and shogi—canonical environments for high-performance planning—the MuZero algorithm matched, without any knowledge of the game dynamics, the superhuman performance of the AlphaZero algorithm<sup><a href="/articles/s41586-020-03051-4#ref-CR5">5</a></sup> that was supplied with the rules of the game.
multidisciplinary sciences