Learning to Generate All Feasible Actions

Mirco Theile,Daniele Bernardini,Raphael Trumpp,Cristina Piazza,Marco Caccamo,Alberto L. Sangiovanni-Vincentelli

DOI: https://doi.org/10.1109/ACCESS.2024.3376739

2024-07-05

Abstract:Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently generate all feasible actions in modern complex cyber - physical systems while ensuring that system constraints are met. Specifically: 1. **System Complexity and Constraints**: As the complexity of the system increases, traditional model - based control strategies are difficult to apply. Although data - driven methods such as Reinforcement Learning (RL) can optimize control strategies, in the learning process, it is often necessary to violate constraints to understand which actions are infeasible, which is computationally unacceptable in many practical applications. 2. **Limitations of Existing Methods**: - **Action Rejection**: If the proposed action is infeasible, a backup strategy is used to generate a feasible action. This method is simple but inefficient. - **Action Resampling**: When the proposed action is infeasible, resample until a feasible action is found. This increases the computational cost and may not effectively cover all feasible actions. - **Action Projection**: Project the infeasible action to the closest feasible action. Although the projected action is close in the action space, it is not necessarily superior in performance, and online optimization may be computationally intensive. 3. **Introduction of the Action Mapping Framework**: To overcome the limitations of the above methods, the author proposes a new framework - Action Mapping. This framework divides the learning process into two steps: - **Step 1**: Train a feasibility policy to be able to generate all feasible actions for a given state. - **Step 2**: Train an objective policy to select the optimal action from the set of feasible actions to maximize the objective function. 4. **Specific Problem**: This paper mainly focuses on the training of the feasibility policy, that is, how to make the agent learn to generate all feasible actions. By formulating the problem as a distribution - matching problem and deriving gradient estimators for different divergences, the author proposes a self - supervised learning method to train the feasibility policy. 5. **Application Scenarios**: To verify the effectiveness of this method, the author conducted three experiments: - An illustrative two - dimensional example; - A robot path - planning problem; - A simple robot - grasping simulation experiment. Through these experiments, the author shows that this method can efficiently generate all feasible actions without violating constraints, thus providing a safe and efficient solution for future reinforcement - learning frameworks. In summary, this paper aims to solve how to use data - driven methods to efficiently generate all feasible actions in complex cyber - physical systems while ensuring that the system's constraints are met.

Learning to Generate All Feasible Actions

Learning to Generate All Feasible Actions

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Action Mapping for Reinforcement Learning in Continuous Environments with Constraints

Active Learning of Abstract Plan Feasibility

Unsupervised Learning of Effective Actions in Robotics

The Feasibility of Constrained Reinforcement Learning Algorithms: A Tutorial Study

Safe Reinforcement Learning on the Constraint Manifold: Theory and Applications

Generating Automatic Curricula via Self-Supervised Active Domain Randomization

On the Role of the Action Space in Robot Manipulation Learning and Sim-to-Real Transfer

Learning to search: Functional gradient techniques for imitation learning

Learning Adaptive Safety for Multi-Agent Systems

Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning

Robot Learning of Mobile Manipulation with Reachability Behavior Priors

Development of an algorithm for managing a multi-robot system for cargo transportation based on reinforcement learning in a virtual environment

Learning Curricula in Open-Ended Worlds

Efficient Learning of High Level Plans from Play

ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

Efficient Reinforcement Learning of Task Planners for Robotic Palletization through Iterative Action Masking Learning

Algorithms or Actions? A Study in Large-Scale Reinforcement Learning