Abstract:Modern control systems are increasingly turning to machine learning algorithms to augment their performance and adaptability. Within this context, Deep Reinforcement Learning (DRL) has emerged as a promising control framework, particularly in the domain of marine transportation. Its potential for autonomous marine applications lies in its ability to seamlessly combine path-following and collision avoidance with an arbitrary number of obstacles. However, current DRL algorithms require disproportionally large computational resources to find near-optimal policies compared to the posed control problem when the searchable parameter space becomes large. To combat this, our work delves into the application of Variational AutoEncoders (VAEs) to acquire a generalized, low-dimensional latent encoding of a high-fidelity range-finding sensor, which serves as the exteroceptive input to a DRL agent. The agent's performance, encompassing path-following and collision avoidance, is systematically tested and evaluated within a stochastic simulation environment, presenting a comprehensive exploration of our proposed approach in maritime control systems.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the performance of autonomous surface vessels (ASV) based on deep reinforcement learning (DRL) in path - following and collision - avoidance tasks, especially in high - fidelity range sensor data processing. Specifically, the researchers explored how to use variational auto - encoders (VAE) to obtain low - dimensional, generalized latent encodings of high - fidelity range sensor data and use them as external perception inputs for DRL agents, in order to reduce the demand for computational resources and improve the navigation ability and collision - avoidance effect of agents in complex environments.
### Background and Problems of the Paper
1. **Development of Modern Control Systems**
- Modern control systems are increasingly using machine - learning algorithms to enhance their performance and adaptability.
- Deep reinforcement learning (DRL), as a promising control framework, shows great potential in the field of marine transportation, especially in autonomous ship applications.
2. **Challenges of Existing DRL Algorithms**
- When the search parameter space is large, existing DRL algorithms require a disproportionately large amount of computational resources to find approximately optimal policies.
- Efficient processing of high - fidelity sensor data is the key to achieving effective path - following and collision - avoidance.
3. **Research Motivation**
- Although VAE is widely used in other fields, its application in DRL for marine control and collision - avoidance is relatively rare.
- This paper aims to improve the performance of DRL agents in path - following and collision - avoidance tasks by extracting useful environmental representations through VAE.
### Main Research Questions
1. **Can the VAE Feature Extractor Be Successfully Integrated into the DRL Agent?**
- Explore whether VAE can effectively extract useful features from high - fidelity sensor data and use them as inputs for DRL agents.
2. **How Do Model Complexity and Hyper - parameters Affect the Data Reproduction and Generalization Ability of VAE?**
- Analyze the differences in data reproduction and generalization ability of VAE models with different complexities, and the impact of these differences on the performance of DRL agents.
3. **What Are the Performance Differences between Pretrained VAE Encoders with Different Complexities and Non - VAE DRL Agents?**
- Compare the performance of DRL agents using pretrained VAE encoders with different complexities and DRL agents without using VAE in path - following and collision - avoidance tasks.
### Method Overview
- **VAE Architecture**
- VAE with shallow (1 - layer) and deep (3 - layer) convolutional neural network (CNN) configurations was adopted to extract low - dimensional representations of high - fidelity sensor data.
- The circular padding method was used to handle the cyclic characteristics of the perception vector to avoid edge effects.
- **DRL Agent**
- The DRL agent was trained using the PPO algorithm, and the agent's task was to perform path - following and collision - avoidance in a simulated environment.
- The performance of different configurations of VAE + PPO models in complex environments was evaluated, including indicators such as path progress, lateral error, duration, and collision rate.
### Experimental Design
- **Data Generation and Augmentation**
- A data set containing 10,000 different range observations was generated and expanded to 60,000 samples through synthetic observations and data augmentation techniques.
- The data set was divided into training, validation, and test sets, and Gaussian noise was added to the training set to enhance robustness.
- **VAE Training and Evaluation**
- The VAE model was trained using the Adam optimizer, and its performance on the training and validation sets was evaluated.
- By visualizing the latent distribution and reconstruction results, the impact of different β values on the latent - space encoding was analyzed.
- **DRL Agent Evaluation**
- The performance of different configurations of VAE + PPO models was compared with that of the baseline model.
- The performance of agents in complex environments was evaluated by statistically analyzing indicators such as path progress, lateral error, duration, and collision rate.
### Conclusion
Through the above methods, the researchers hope to verify the effectiveness of VAE in DRL agents, especially its ability to handle high - fidelity sensor data and improve path - following and collision - avoidance performance. This will provide new ideas and technical support for the intelligent control of autonomous surface vessels.