Using reinforcement learning to improve drone-based inference of greenhouse gas fluxes

Alouette van Hove,Kristoffer Aalstad,Norbert Pirk
DOI: https://doi.org/10.5617/nmi.9897
2024-01-08
Abstract:Accurate mapping of greenhouse gas fluxes at the Earth's surface is essential for the validation and calibration of climate models. In this study, we present a framework for surface flux estimation with drones. Our approach uses data assimilation (DA) to infer fluxes from drone-based observations, and reinforcement learning (RL) to optimize the drone's sampling strategy. Herein, we demonstrate that a RL-trained drone can quantify a CO2 hotspot more accurately than a drone sampling along a predefined flight path that traverses the emission plume. We find that information-based reward functions can match the performance of an error-based reward function that quantifies the difference between the estimated surface flux and the true value. Reward functions based on information gain and information entropy can motivate actions that increase the drone's confidence in its updated belief, without requiring knowledge of the true surface flux. These findings provide valuable insights for further development of the framework for the mapping of more complex surface flux fields.
Machine Learning,Robotics,Atmospheric and Oceanic Physics
What problem does this paper attempt to address?
The paper discusses how to use reinforcement learning (RL) to optimize the estimation of greenhouse gas fluxes by unmanned aerial vehicles (UAVs) to improve the accuracy of climate model validation and calibration. The study proposes a method that combines data assimilation (DA) and UAV observations to estimate surface fluxes and uses RL to optimize the UAV's sampling strategy. Specifically, they demonstrate that UAVs trained with RL can quantify carbon dioxide hotspots more accurately than UAVs following predefined flight paths. The research finds that reward functions based on information gain and information entropy can match error-based reward functions, which measure the difference between estimated fluxes and true values. These information-based reward functions can incentivize UAVs to take actions that increase their confidence in updated beliefs without needing knowledge of the true surface flux. The paper also discusses the consistency of RL in different sampling strategies under different true surface flux values and starting positions, as well as the influence of different reward functions on estimation results. The results show that UAVs trained with RL have lower average Continuous Ranked Probability Score (CRPS) in estimating flux intensities compared to UAVs using predefined grid paths. Future research directions include extending the framework to handle more complex surface flux fields, such as hotspots with unknown locations, multiple hotspots, and simultaneous estimation of different types of greenhouse gas fluxes, which will require more complex RL algorithms, such as neural networks.