Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water Monitoring

Samuel Yanes Luis,Dmitriy Shutin,Juan Marchal Gómez,Daniel Gutiérrez Reina,Sergio Toral Marín
2024-01-09
Abstract:The conservation of hydrological resources involves continuously monitoring their contamination. A multi-agent system composed of autonomous surface vehicles is proposed in this paper to efficiently monitor the water quality. To achieve a safe control of the fleet, the fleet policy should be able to act based on measurements and to the the fleet state. It is proposed to use Local Gaussian Processes and Deep Reinforcement Learning to jointly obtain effective monitoring policies. Local Gaussian processes, unlike classical global Gaussian processes, can accurately model the information in a dissimilar spatial correlation which captures more accurately the water quality information. A Deep convolutional policy is proposed, that bases the decisions on the observation on the mean and variance of this model, by means of an information gain reward. Using a Double Deep Q-Learning algorithm, agents are trained to minimize the estimation error in a safe manner thanks to a Consensus-based heuristic. Simulation results indicate an improvement of up to 24% in terms of the mean absolute error with the proposed models. Also, training results with 1-3 agents indicate that our proposed approach returns 20% and 24% smaller average estimation errors for, respectively, monitoring water quality variables and monitoring algae blooms, as compared to state-of-the-art approaches
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper focuses on using a multi-agent deep reinforcement learning framework to solve the information gathering problem in water monitoring, specifically for monitoring water pollution phenomena such as cyanobacterial blooms. The study proposes a method that combines local Gaussian processes with deep reinforcement learning to optimize the path planning of autonomous surface vehicles (ASV) fleets for more effective water quality monitoring. Specifically, the challenges mentioned in the paper include: 1. **Environment Uncertainty**: Due to the spatiotemporal locality and variability of water pollution sources, a safe control strategy is needed that can make decisions based on measurement data and fleet status. 2. **Information Acquisition**: By using local Gaussian processes (as opposed to global Gaussian processes), more accurate modeling of non-uniform spatial correlations can be achieved to better capture water quality information. 3. **Path Planning**: A deep convolutional strategy based on information gain rewards is proposed, which makes decisions based on the mean and variance of the model. 4. **Safety and Collaboration**: Through a consensus-inspired approach and leveraging the Double Deep Q-Learning algorithm to train agents, collision avoidance can be ensured while minimizing estimation errors. 5. **Performance Improvement**: Simulation results show that compared to existing methods, the proposed model improves the average absolute error by 24% and reduces the average estimation error by 20% and 24% in monitoring water quality variables and cyanobacterial blooms respectively. The paper also discusses the advantages of local Gaussian processes over global Gaussian processes, such as better adaptability, higher scalability, and modeling capability for discontinuous environments. In addition, the paper proposes using deep reinforcement learning to train adaptive strategies for intelligent agents to maximize information gathering and designs a reward function based on information gain and observations. Finally, the paper provides an overview of related work including particle swarm optimization, deep reinforcement learning-based path planning, and other obstacle avoidance methods. It also describes in detail the components of the proposed method such as the local Gaussian process model, deep reinforcement learning framework, and the design of reward and observation functions.