Trajectory Planning with Deep Reinforcement Learning in High-Level Action Spaces

Kyle R. Williams,Rachel Schlossman,Daniel Whitten,Joe Ingram,Srideep Musuvathy,Anirudh Patel,James Pagan,Kyle A. Williams,Sam Green,Anirban Mazumdar,Julie Parish
DOI: https://doi.org/10.1109/TAES.2022.3218496
2022-08-13
Abstract:This paper presents a technique for trajectory planning based on continuously parameterized high-level actions (motion primitives) of variable duration. This technique leverages deep reinforcement learning (Deep RL) to formulate a policy which is suitable for real-time implementation. There is no separation of motion primitive generation and trajectory planning: each individual short-horizon motion is formed during the Deep RL training to achieve the full-horizon objective. Effectiveness of the technique is demonstrated numerically on a well-studied trajectory generation problem and a planning problem on a known obstacle-rich map. This paper also develops a new loss function term for policy-gradient-based Deep RL, which is analogous to an anti-windup mechanism in feedback control. We demonstrate the inclusion of this new term in the underlying optimization increases the average policy return in our numerical example.
Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problems of long training time and poor performance encountered when using deep reinforcement learning (DRL) for trajectory planning in high - dimensional action spaces. Specifically, the author proposes a technique based on parameterized high - level actions, which are sub - trajectories with variable shapes and durations. Through this method, the paper shows how to use high - level actions to improve the performance of guidance policies generated by reinforcement learning, reduce the required training steps, and improve path performance. ### Main contributions of the paper: 1. **Introduction of high - level action space (HLAS)**: A high - level action space is introduced in deep reinforcement learning, where the agent can choose the duration of each action, which helps to improve exploration efficiency and minimize the need for reward shaping. 2. **Promote exploration**: This method promotes exploration in the environment by allowing the agent to choose the duration of actions, thereby improving learning efficiency. 3. **Prevent "Action Windup"**: A loss function term is developed to prevent the "Action Windup" phenomenon in policy gradient methods, that is, the actions generated on average by the policy exceed the action limits. 4. **Performance and training improvements**: Through the example of shuttle re - entry, significant improvements in path performance and training speed of this method are demonstrated. ### Technical details: - **Sub - trajectories**: Each high - level action is defined as a sub - trajectory with a variable duration and can be represented as a control input function or a desired output function. - **Reward design**: The reward signal is designed to accumulate performance indices at each action step and provide an additional reward at the terminal state. - **Constraints**: State and path constraints are handled by simple policies. If the agent violates these constraints, the current episode will end and the agent will not receive further rewards. - **Theoretical analysis**: It is proved that the error growth of the method using the high - level action space when approximating the optimal control problem is bounded, and the error will decrease as the polynomial order increases. ### Experimental verification: - **Shuttle re - entry mission**: In the shuttle re - entry mission, this method improves the path performance (latitude range) by 18% compared to the baseline reinforcement learning implementation, and the training steps are reduced by about 75%. - **Obstacle environment**: It is shown that this method can also work effectively in an environment containing obstacles. In conclusion, through the introduction of the high - level action space and related technical improvements, this paper significantly improves the performance and efficiency of deep reinforcement learning in complex trajectory planning tasks.