Abstract:There is considerable interest in applying reinforcement learning (RL) to improve machine control across multiple industries, and the automotive industry is one of the prime examples. Monte Carlo Tree Search (MCTS) has emerged and proven powerful in decision-making games, even without understanding the rules. In this study, multibody system dynamics (MSD) control is first modeled as a Markov Decision Process and solved with Monte Carlo Tree Search. Based on randomized search space exploration, the MCTS framework builds a selective search tree by repeatedly applying a Monte Carlo rollout at each child node. However, without a library of available choices, deciding among the many possibilities for agent parameters can be intimidating. In addition, the MCTS poses a significant challenge for searching due to the large branching factor. This challenge is typically overcome by appropriate parameter design, search guiding, action reduction, parallelization, and early termination. To address these shortcomings, the overarching goal of this study is to provide needed insight into inverted pendulum controls via vanilla and modified MCTS agents, respectively. A series of reward functions are well-designed according to the control goal, which maps a specific distribution shape of reward bonus and guides the MCTS-based control to maintain the upright position. Numerical examples show that the reward-modified MCTS algorithms significantly improve the control performance and robustness of the default choice of a constant reward that constitutes the vanilla MCTS. The exponentially decaying reward functions perform better than the constant value or polynomial reward functions. Moreover, the exploitation vs. exploration trade-off and discount parameters are carefully tested. The study's results can guide the research of RL-based MSD users.

UNSAT Solver Synthesis via Monte Carlo Forest Search

In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Monte Carlo Tree Search for Policy Optimization.

Improved Tree Search for Automatic Program Synthesis

Policies Grow on Trees: Model Checking Families of MDPs

Monte Carlo Game Solver

Rethinking Branching on Exact Combinatorial Optimization Solver: the First Deep Symbolic Discovery Framework

Towards Mixed Optimization for Reinforcement Learning with Program Synthesis

Combinatorial Optimization with Policy Adaptation using Latent Space Search

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios

Using deep learning to construct stochastic local search SAT solvers with performance bounds

A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Fittest Survival: an Enhancement Mechanism for Monte Carlo Tree Search.

ESampler: Boosting Sampling of Satisfying Assignments for Boolean Formulas Via Derivation

1-2-3-Go! Policy Synthesis for Parameterized Markov Decision Processes via Decision-Tree Learning and Generalization

An Efficient Dynamic Sampling Policy for Monte Carlo Tree Search.

Monte Carlo tree search control scheme for multibody dynamics applications

A Scalable Derivative-free Exploration Approach for Reinforcement Learning