Global atmospheric data assimilation with multi-modal masked autoencoders

Thomas J. Vandal,Kate Duffy,Daniel McDuff,Yoni Nachmany,Chris Hartshorn
2024-07-16
Abstract:Global data assimilation enables weather forecasting at all scales and provides valuable data for studying the Earth system. However, the computational demands of physics-based algorithms used in operational systems limits the volume and diversity of observations that are assimilated. Here, we present "EarthNet", a multi-modal foundation model for data assimilation that learns to predict a global gap-filled atmospheric state solely from satellite observations. EarthNet is trained as a masked autoencoder that ingests a 12 hour sequence of observations and learns to fill missing data from other sensors. We show that EarthNet performs a form of data assimilation producing a global 0.16 degree reanalysis dataset of 3D atmospheric temperature and humidity at a fraction of the time compared to operational systems. It is shown that the resulting reanalysis dataset reproduces climatology by evaluating a 1 hour forecast background state against observations. We also show that our 3D humidity predictions outperform MERRA-2 and ERA5 reanalyses by 10% to 60% between the middle troposphere and lower stratosphere (5 to 20 km altitude) and our 3D temperature and humidity are statistically equivalent to the Microwave integrated Retrieval System (MiRS) observations at nearly every level of the atmosphere. Our results indicate significant promise in using EarthNet for high-frequency data assimilation and global weather forecasting.
Machine Learning,Atmospheric and Oceanic Physics
What problem does this paper attempt to address?
### The Problem This Paper Attempts to Solve The main purpose of this paper is to present a new global atmospheric data assimilation method using Multi-modal Masked Autoencoders (MMAE) to predict a seamless global atmospheric state. Specifically, the paper attempts to address the following issues: 1. **Computational Limitations of Existing Data Assimilation Systems**: - Current physics-based algorithms for assimilating observational data in operational systems have high computational demands and long processing times. - These limitations result in a large amount of available observational data not being fully utilized, affecting the quality of weather forecasts. 2. **Improving Utilization of Observational Data**: - A multi-modal foundational model named "EarthNet" is proposed, which can predict a seamless global atmospheric state from satellite observational data. - EarthNet, trained through masked autoencoders, can fill in missing data from other sensors. 3. **Achieving Efficient and High-Resolution Data Assimilation**: - Compared to traditional operational systems, EarthNet can generate high-resolution three-dimensional atmospheric temperature and humidity reanalysis datasets in a shorter time. - Experimental results show that EarthNet's humidity predictions in the mid-troposphere and lower stratosphere (5 to 20 kilometers in altitude) outperform existing MERRA-2 and ERA5 reanalysis data. 4. **Validation and Improvement of Atmospheric Observations**: - The paper validates EarthNet's performance by comparing it with real observational data (such as MiRS observations). - Sensitivity analysis tests the importance of different sensors and demonstrates the contribution of each sensor in the predictions. Overall, this paper aims to achieve more efficient and higher-resolution data assimilation through multi-modal masked autoencoder technology, thereby improving the quality of global weather forecasts.