reBEN: Refined BigEarthNet Dataset for Remote Sensing Image Analysis

Kai Norman Clasen,Leonard Hackel,Tom Burgert,Gencer Sumbul,Begüm Demir,Volker Markl
2024-07-29
Abstract:This paper presents refined BigEarthNet (reBEN) that is a large-scale, multi-modal remote sensing dataset constructed to support deep learning (DL) studies for remote sensing image analysis. The reBEN dataset consists of 549,488 pairs of Sentinel-1 and Sentinel-2 image patches. To construct reBEN, we initially consider the Sentinel-1 and Sentinel-2 tiles used to construct the BigEarthNet dataset and then divide them into patches of size 1200 m x 1200 m. We apply atmospheric correction to the Sentinel-2 patches using the latest version of the sen2cor tool, resulting in higher-quality patches compared to those present in BigEarthNet. Each patch is then associated with a pixel-level reference map and scene-level multi-labels. This makes reBEN suitable for pixel- and scene-based learning tasks. The labels are derived from the most recent CORINE Land Cover (CLC) map of 2018 by utilizing the 19-class nomenclature as in BigEarthNet. The use of the most recent CLC map results in overcoming the label noise present in BigEarthNet. Furthermore, we introduce a new geographical-based split assignment algorithm that significantly reduces the spatial correlation among the train, validation, and test sets with respect to those present in BigEarthNet. This increases the reliability of the evaluation of DL models. To minimize the DL model training time, we introduce software tools that convert the reBEN dataset into a DL-optimized data format. In our experiments, we show the potential of reBEN for multi-modal multi-label image classification problems by considering several state-of-the-art DL models. The pre-trained model weights, associated code, and complete dataset are available at <a class="link-external link-https" href="https://bigearth.net" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The main problems that this paper attempts to solve include: 1. **Atmospheric Correction Tool Update**: - The Sentinel - 2 images in the BigEarthNet dataset were pre - processed using an older version (2.5.5) of the atmospheric correction tool sen2cor. With the update of the sen2cor tool, the new version (such as 2.11) can provide higher - quality image output. Therefore, images processed with the old version may affect the performance of deep - learning models. To solve this problem, reBEN uses the latest version of the sen2cor tool to process Sentinel - 2 images. 2. **Land Use and Cover (LULC) Label Noise**: - When constructing the BigEarthNet dataset, the labels of land use and cover categories were generated based on the preliminary CORINE Land Cover (CLC) 2018 map. However, this map has corrected some wrong and missing annotations in subsequent updates, resulting in label noise in BigEarthNet. In addition, BigEarthNet lacks a pixel - level reference map and is not suitable for pixel - level learning tasks. reBEN uses the latest CLC2018 map to generate labels and provides a pixel - level reference map, thus solving these problems. 3. **Spatial Correlation of Training, Validation and Test Sets**: - The training, validation and test set splitting algorithms recommended by BigEarthNet have a high spatial correlation, which makes the result evaluation unreliable. reBEN introduces a new geographically - based split - assignment algorithm. By only assigning geographically overlapping areas in the same season to the training set, it significantly reduces the spatial correlation between different sets and improves the reliability of evaluation. 4. **Lack of Efficient Software Tools**: - Loading and processing BigEarthNet image data takes a lot of time, especially when training deep - learning models. To reduce the training time, reBEN provides a software tool named rico - hdl, which can convert the dataset into an optimized deep - learning format, remove unnecessary metadata, and only store image data, thereby increasing the random read throughput. 5. **Lack of Recent Pretrained Models**: - The pretrained models provided when BigEarthNet was released are out - of - date. reBEN provides model weights pretrained using the latest deep - learning architectures (such as ResNet, ViT, MLP - Mixer, etc.) to ensure that researchers can use state - of - the - art models for research. By solving the above problems, the reBEN dataset aims to provide more reliable and interpretable research results for remote - sensing image analysis, especially in multi - modal multi - label image classification tasks.