Abstract:There is considerable interest in applying reinforcement learning (RL) to improve machine control across multiple industries, and the automotive industry is one of the prime examples. Monte Carlo Tree Search (MCTS) has emerged and proven powerful in decision-making games, even without understanding the rules. In this study, multibody system dynamics (MSD) control is first modeled as a Markov Decision Process and solved with Monte Carlo Tree Search. Based on randomized search space exploration, the MCTS framework builds a selective search tree by repeatedly applying a Monte Carlo rollout at each child node. However, without a library of available choices, deciding among the many possibilities for agent parameters can be intimidating. In addition, the MCTS poses a significant challenge for searching due to the large branching factor. This challenge is typically overcome by appropriate parameter design, search guiding, action reduction, parallelization, and early termination. To address these shortcomings, the overarching goal of this study is to provide needed insight into inverted pendulum controls via vanilla and modified MCTS agents, respectively. A series of reward functions are well-designed according to the control goal, which maps a specific distribution shape of reward bonus and guides the MCTS-based control to maintain the upright position. Numerical examples show that the reward-modified MCTS algorithms significantly improve the control performance and robustness of the default choice of a constant reward that constitutes the vanilla MCTS. The exponentially decaying reward functions perform better than the constant value or polynomial reward functions. Moreover, the exploitation vs. exploration trade-off and discount parameters are carefully tested. The study's results can guide the research of RL-based MSD users.

On the optimal pivot path of simplex method for linear programming based on reinforcement learning

Optimal pivot path of the simplex method for linear programming based on reinforcement learning

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Learning to Pivot as a Smart Expert

On the Simplex Method for 0/1-Polytopes

Exponential Lower Bounds for Many Pivot Rules for the Simplex Method

Reinforcement Learning for Node Selection in Mixed Integer Programming

A unified worst case for classical simplex and policy iteration pivot rules

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

A Monotonic Build-Up Simplex Algorithm for Linear Programming

Sparsity Prevention Pivoting Method for Linear Programming.

A novel reinforcement learning-based method for structure optimization

Integer Programming as a General Solution Methodology for Path-Based Optimization in Robotics: Principles, Best Practices, and Applications

One-shot Learning for MIPs with SOS1 Constraints

Monte Carlo tree search control scheme for multibody dynamics applications

Beyond Trial and Error: Lane Keeping with Monte Carlo Tree Search-Driven Optimization of Reinforcement Learning

Monte Carlo Tree Search for Policy Optimization.

A double-pivot simplex algorithm and its upper bounds of the iteration numbers

Tea flavonols in cardiovascular disease and cancer epidemiology.

RL-MILP Solver: A Reinforcement Learning Approach for Solving Mixed-Integer Linear Programs with Graph Neural Networks

Exponential lower bounds for history-based simplex pivot rules on abstract cubes