Abstract:Abstract Reinforcement learning (RL) approaches that combine a tree search with deep learning have found remarkable success in searching exorbitantly large, albeit discrete action spaces, as in chess, Shogi and Go. Many real-world materials discovery and design applications, however, involve multi-dimensional search problems and learning domains that have continuous action spaces. Exploring high-dimensional potential energy models of materials is an example. Traditionally, these searches are time consuming (often several years for a single bulk system) and driven by human intuition and/or expertise and more recently by global/local optimization searches that have issues with convergence and/or do not scale well with the search dimensionality. Here, in a departure from discrete action and other gradient-based approaches, we introduce a RL strategy based on decision trees that incorporates modified rewards for improved exploration, efficient sampling during playouts and a “window scaling scheme" for enhanced exploitation, to enable efficient and scalable search for continuous action space problems. Using high-dimensional artificial landscapes and control RL problems, we successfully benchmark our approach against popular global optimization schemes and state of the art policy gradient methods, respectively. We demonstrate its efficacy to parameterize potential models (physics based and high-dimensional neural networks) for 54 different elemental systems across the periodic table as well as alloys. We analyze error trends across different elements in the latent space and trace their origin to elemental structural diversity and the smoothness of the element energy surface. Broadly, our RL strategy will be applicable to many other physical science problems involving search over continuous action spaces.

Tree Based Discretization for Continuous State Space Reinforcement Learning

No-Fringe U-Tree: An Optimized Algorithm for Reinforcement Learning

TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

Policy Sharing Using Aggregation Trees for ${Q}$ -Learning in a Continuous State and Action Spaces

CDT: Cascading Decision Trees for Explainable Reinforcement Learning

Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning

Representation learning for continuous action spaces is beneficial for efficient policy learning

Interpretable Reinforcement Learning for Robotics and Continuous Control

Efficient Tree Policy with Attention-Based State Representation for Interactive Recommendation

Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy

An Idiosyncrasy of Time-discretization in Reinforcement Learning

Reinforcement Learning Method of Continuous State Adaptively Discretized Based on K-means Clustering

Upside-Down Reinforcement Learning for More Interpretable Optimal Control

Learning in continuous action space for developing high dimensional potential energy models

Large-scale Interactive Recommendation with Tree-structured Policy Gradient

Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

Continuous control with deep reinforcement learning

KB-Tree: Learnable and Continuous Monte-Carlo Tree Search for Autonomous Driving Planning

Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning

Continuous Monte Carlo Graph Search