Abstract:To support the general problem of Autonomous Underwater/Surface Vehicle (AUV/ASV) based chemical detection, source localization, we propose the design of a system that is a fusion of AUV/ASV with Q-learning, and a real-time underwater mass spectrometer, used to provide the feedback and reward signal for in situ source localization. Additionally, an autonomous sampler can be coupled to the system permitting molecular material archiving for subsequent expanded measurement and validation in the lab. This real-time chemical sensor and archived sample capture and verification approach yields an adaptive sensing and sampling system. The in situ mass spectrometer allows for real time measurement of membrane compatible chemistries such as volatile oxidative compounds (VOC’s) and lightweight gases, while the sampler purifies, enriches and accurately isolates targeted molecular compounds in the field for subsequent full mass spectrometer analysis back in the lab. In the overall AUV system design, the battery driven mass spectrometer provides real-time mass spectrometer signals for reinforcement learning (RL) behaviors and the portable adaptive sampling system automates sample collection, molecular purification/concentration and preservation. The mass spectrometer is of the membrane inlet type and the automated sampler system is a combination of customizable fluidic management systems, pumps, valve arrays and motion control systems. For the field sampling use, the prototype sampling module is designed for triggered sensing and sampling but also can be variably actuated to sample variable volumes over any period of time. The mass spectrometer and sampling systems can be hosted on AUVs/ASVs for most chemical source localization activities. The entire mobile system: AUV mobile platform, reinforcement learning controller, mass spectrometer, and sampler, constitute an adaptive chemical sampling platform. The ‘back end’ laboratory identification is performed using any type of mass spectrometers and can provide a high confidence verification of the specific material archived. The results from the lab verification can also constitute the design of a reward signal for subsequent Q-learning training, mass spectrometer data sub-system to increase the accuracy of the source localization policy. The potential of using mass spectrometer data to train a Q-learning based agent allows the team to pretrain the agent with real sensory data similar to that which will be seen in the field for future deployments. Appropriately simulated data can approximate the environment and distribution patterns that are anticipated for the development of a custom reward function, representative of the mission objective. Preliminary simulations testing the agent’s performance, utilizing a trained policy in a similar environment in which the location of a generic `pollution source’ has been perturbed from the training scenario, have shown promising results. The policy is acquired by training on pollution data for a set environment in which the trade-off between exploration and exploitation is defined appropriately for the environment size, pollution distribution and training duration to optimize the agent’s learning. That policy is then tested in a similar but slightly perturbed environment. This method can be applied to future missions to allow for continual policy update based on the observed data. This would be an advantageous approach as it limits the necessity for operator-vehicle communication giving the agent sufficient autonomy to locate the source based on its prior training as well as circumvents the need for a model-based decision and control approach as the agent becomes better trained through real world observations. This is a model-free learning approach requiring no a priori knowledge of the environment. This has a distinct benefit over model-based approaches which are dependent on the accuracy and fidelity of the environmental model during the training of the agent, which is notoriously difficult both logistically and computationally.

Using Model-Free Reinforcement Learning Combined With Underwater Mass Spectrometer and Material Archiving Coupled to Lab Analysis for Autonomous Chemical Source Verifications

Neural Network Model-Based Reinforcement Learning Control for AUV 3-D Path Following

Asynchronous Localization for Underwater Acoustic Sensor Networks: A Continuous Control Deep Reinforcement Learning Approach

Sim-to-Real Transfer of Adaptive Control Parameters for AUV Stabilization under Current Disturbance

Multi-AUV Cooperative Localization in Adaptive Sampling for Marine Environmental Monitoring

Gas concentration mapping and source localization for environmental monitoring through unmanned aerial systems using model-free reinforcement learning agents

Learning an End-To-End Policy for AUV Control Within Just Forty Minutes Using Parallel Simulation

Sim-to-real transfer of adaptive control parameters for AUV stabilisation under current disturbance

UW-MARL: Multi-Agent Reinforcement Learning for Underwater Adaptive Sampling using Autonomous Vehicles

Data-Driven Learning and Planning for Environmental Sampling

Integrated Localization and Tracking for AUV With Model Uncertainties via Scalable Sampling-Based Reinforcement Learning Approach

Unmanned Surface Vehicle Aided Maritime Data Collection Using Deep Reinforcement Learning

AUV Obstacle Avoidance Framework Based on Event-Triggered Reinforcement Learning

Smart Underwater Pollution Detection Based on Graph-Based Multi-Agent Reinforcement Learning Towards AUV-Based Network ITS

Multi-Agent Reinforcement Learning Based Secure Searching and Data Collection in AUV Swarms.

A Method for Long-Term Target Anti-Interference Tracking Combining Deep Learning and CKF for LARS Tracking and Capturing

Robust ASV Navigation Through Ground to Water Cross-Domain Deep Reinforcement Learning

Adaptive sampling with an autonomous underwater vehicle in static marine environments

An AUV Target-Tracking Method Combining Imitation Learning and Deep Reinforcement Learning

Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle

AUV-Aided Optical-Acoustic Hybrid Data Collection Based on Deep Reinforcement Learning