M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and Multispectral Data

Matthew J Allen,Francisco Dorr,Joseph Alejandro Gallego Mejia,Laura Martínez-Ferrer,Anna Jungbluth,Freddie Kalaitzis,Raúl Ramos-Pollán
2024-10-31
Abstract:Satellite-based remote sensing has revolutionised the way we address global challenges. Huge quantities of Earth Observation (EO) data are generated by satellite sensors daily, but processing these large datasets for use in ML pipelines is technically and computationally challenging. While some preprocessed Earth observation datasets exist, their content is often limited to optical or near-optical wavelength data, which is ineffective at night or in adverse weather conditions. Synthetic Aperture Radar (SAR), an active sensing technique based on microwave length radiation, offers a viable alternative. However, the application of machine learning to SAR has been limited due to a lack of ML-ready data and pipelines, particularly for the full diversity of SAR data, including polarimetry, coherence and interferometry. In this work, we introduce M3LEO, a multi-modal, multi-label Earth observation dataset that includes polarimetric, interferometric, and coherence SAR data derived from Sentinel-1, alongside multispectral Sentinel-2 imagery and auxiliary data describing terrain properties such as land use. M3LEO spans approximately 17M 4x4 km data chips from six diverse geographic regions. The dataset is complemented by a flexible PyTorch Lightning framework configured using Hydra to accommodate its use across diverse ML applications in Earth observation. We provide tools to process any dataset available on popular platforms such as Google Earth Engine for seamless integration with our framework. We show that the distribution shift in self-supervised embeddings is substantial across geographic regions, even when controlling for terrain properties. Data: <a class="link-external link-http" href="http://huggingface.co/M3LEO" rel="external noopener nofollow">this http URL</a>, Code: <a class="link-external link-http" href="http://github.com/spaceml-org/M3LEO" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in Earth Observation (EO) data processing, especially the complexity and data preparation difficulties when applying Synthetic Aperture Radar (SAR) data to Machine Learning (ML). Specifically: 1. **Multi - source data integration**: - **Fusion of optical and SAR data**: Existing Earth Observation data sets usually only contain data in the optical or near - optical bands, which cannot work effectively at night or in bad weather conditions. SAR data can penetrate clouds and image at night, but its application is limited by the lack of ML - ready data and processing tools. By introducing the M3LEO data set, the paper integrates polarized, interferometric and coherent SAR data from Sentinel - 1 and multispectral images from Sentinel - 2, providing more comprehensive multi - modal and multi - label Earth Observation data. 2. **Data pre - processing and formatting**: - **Data alignment and slicing**: EO data from different sources are usually hosted on different platforms, with different degrees of availability, and there are technical barriers in terms of spatial alignment and data slicing. The M3LEO data set provides pre - processed, ML - readable, sliced images, simplifying the data processing flow and making it convenient for non - expert users to use these data. 3. **Application of large - scale data**: - **Distribution changes across geographical regions**: The paper shows the distribution changes of self - supervised embeddings across different geographical regions, even when terrain attributes are controlled. This indicates the possible challenges and opportunities when using SAR data for deep learning in different geographical regions. 4. **Tool and framework support**: - **PyTorch Lightning framework**: To further lower the usage threshold, the paper provides a flexible PyTorch Lightning framework and uses Hydra for configuration management to adapt to various ML application scenarios. - **Google Earth Engine integration tools**: It also provides tools that enable ML practitioners to process any data set from Google Earth Engine and convert it into a sliced format compatible with the M3LEO framework. ### Summary By introducing the M3LEO data set, the paper solves multiple technical challenges in the application of multi - modal Earth Observation data in machine learning, including data integration, pre - processing, large - scale application and tool support, thus promoting the wide application and development of SAR data in the field of Earth Observation.