Learning to Generate All Feasible Actions

Mirco Theile,Daniele Bernardini,Raphael Trumpp,Cristina Piazza,Marco Caccamo,Alberto L. Sangiovanni-Vincentelli
DOI: https://doi.org/10.1109/ACCESS.2024.3376739
2024-07-05
Abstract:Modern cyber-physical systems are becoming increasingly complex to model, thus motivating data-driven techniques such as reinforcement learning (RL) to find appropriate control agents. However, most systems are subject to hard constraints such as safety or operational bounds. Typically, to learn to satisfy these constraints, the agent must violate them systematically, which is computationally prohibitive in most systems. Recent efforts aim to utilize feasibility models that assess whether a proposed action is feasible to avoid applying the agent's infeasible action proposals to the system. However, these efforts focus on guaranteeing constraint satisfaction rather than the agent's learning efficiency. To improve the learning process, we introduce action mapping, a novel approach that divides the learning process into two steps: first learn feasibility and subsequently, the objective by mapping actions into the sets of feasible actions. This paper focuses on the feasibility part by learning to generate all feasible actions through self-supervised querying of the feasibility model. We train the agent by formulating the problem as a distribution matching problem and deriving gradient estimators for different divergences. Through an illustrative example, a robotic path planning scenario, and a robotic grasping simulation, we demonstrate the agent's proficiency in generating actions across disconnected feasible action sets. By addressing the feasibility step, this paper makes it possible to focus future work on the objective part of action mapping, paving the way for an RL framework that is both safe and efficient.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently generate all feasible actions in modern complex cyber - physical systems while ensuring that system constraints are met. Specifically: 1. **System Complexity and Constraints**: As the complexity of the system increases, traditional model - based control strategies are difficult to apply. Although data - driven methods such as Reinforcement Learning (RL) can optimize control strategies, in the learning process, it is often necessary to violate constraints to understand which actions are infeasible, which is computationally unacceptable in many practical applications. 2. **Limitations of Existing Methods**: - **Action Rejection**: If the proposed action is infeasible, a backup strategy is used to generate a feasible action. This method is simple but inefficient. - **Action Resampling**: When the proposed action is infeasible, resample until a feasible action is found. This increases the computational cost and may not effectively cover all feasible actions. - **Action Projection**: Project the infeasible action to the closest feasible action. Although the projected action is close in the action space, it is not necessarily superior in performance, and online optimization may be computationally intensive. 3. **Introduction of the Action Mapping Framework**: To overcome the limitations of the above methods, the author proposes a new framework - Action Mapping. This framework divides the learning process into two steps: - **Step 1**: Train a feasibility policy to be able to generate all feasible actions for a given state. - **Step 2**: Train an objective policy to select the optimal action from the set of feasible actions to maximize the objective function. 4. **Specific Problem**: This paper mainly focuses on the training of the feasibility policy, that is, how to make the agent learn to generate all feasible actions. By formulating the problem as a distribution - matching problem and deriving gradient estimators for different divergences, the author proposes a self - supervised learning method to train the feasibility policy. 5. **Application Scenarios**: To verify the effectiveness of this method, the author conducted three experiments: - An illustrative two - dimensional example; - A robot path - planning problem; - A simple robot - grasping simulation experiment. Through these experiments, the author shows that this method can efficiently generate all feasible actions without violating constraints, thus providing a safe and efficient solution for future reinforcement - learning frameworks. In summary, this paper aims to solve how to use data - driven methods to efficiently generate all feasible actions in complex cyber - physical systems while ensuring that the system's constraints are met.