Automatic training sample collection utilizing multi-source land cover products and time-series Sentinel-2 images

Yanzhao Wang,Yonghua Sun,Xuyue Cao,Yihan Wang,Wangkuan Zhang,Xinglu Cheng,Ruozeng Wang,Jinkun Zong
DOI: https://doi.org/10.1080/15481603.2024.2352957
IF: 6.397
2024-05-15
GIScience & Remote Sensing
Abstract:Collecting reliable training samples plays a crucial role in improving the accuracy of land cover (LC) mapping products, which are essential foundational data for global environmental and climate change research. However, the process is labor-intensive and time-consuming, as it heavily relies on human interpretation. This article proposes an automatic training sample collection approach (ATSC) that utilizes multi-source LC products and time-series Sentinel-2 images. Firstly, a preliminary sample dataset was generated by fusing multiple LC products with the weighted majority voting (WMV) algorithm. Secondly, a locally selective combination in parallel outlier ensembles (LSCP) anomaly detection algorithm was applied to filter abnormal samples. The results revealed that (1) the China Land Cover Dataset (CLCD) had the highest overall accuracy (73.22%), and the ESRI Land Cover (ESRI) had the lowest overall accuracy (59.93%). Tree cover, built area, and water showed high accuracy across all products, while shrubland and wetland generally had low accuracy. (2) The average accuracy of the preliminary training samples for the four study areas was 95.62%. However, there were still abnormal samples, such as classification errors, LC changes within a year, and spectral anomalies. (3) Using the LSCP algorithm, 70.10% of the abnormal samples were removed, resulting in a final training sample accuracy that exceeded 97.95% in each region. The ATSC approach provides higher-quality training samples for LC classification and facilitates large-scale LC mapping initiatives.
geography, physical,remote sensing
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper aims to address the challenges in training sample collection for land cover (LC) classification. Specifically: 1. **Improving Training Sample Quality**: - The current land cover products have low accuracy in certain categories (such as shrubland, wetlands, and grasslands), leading to poor quality of training samples. - The process of collecting training samples is usually labor-intensive and time-consuming, relying on manual interpretation. 2. **Automated Training Sample Collection Method**: - An automated training sample collection method (ATSC) is proposed, utilizing multi-source land cover products and time-series Sentinel-2 imagery to improve the reliability and representativeness of training samples. - By integrating multiple land cover products and using a weighted majority voting algorithm (WMV), an initial training sample set is generated. - A local selective combination parallel anomaly detection algorithm (LSCP) is used to filter out anomalous samples, further enhancing the accuracy of the final training samples. 3. **Application in Large-Scale Land Cover Mapping**: - This method can provide high-quality training samples suitable for large-scale and high-resolution land cover classification tasks, contributing to the advancement of global environmental and climate change research.