Abstract:Accurate cropland information is crucial for the assessment of food security and the formulation of effective agricultural policies. Extracting cropland from remote sensing imagery is challenging due to spectral diversity and mixed pixels. Recent advances in remote sensing technology have facilitated the availability of very high-resolution (VHR) remote sensing images that provide detailed ground information. However, VHR cropland extraction in southern China is difficult because of the high heterogeneity and fragmentation of cropland and the insufficient observations of VHR sensors. To address these challenges, we proposed a deep learning-based method for automated high-resolution cropland extraction. The method used an improved HRRS-U-Net model to accurately identify the extent of cropland and explicitly locate field boundaries. The HRRS-U-Net maintained high-resolution details throughout the network to generate precise cropland boundaries. Additionally, the residual learning (RL) and the channel attention mechanism (CAM) were introduced to extract deeper discriminative representations. The proposed method was evaluated over four city-wide study areas (Qingyuan, Yangjiang, Guangzhou, and Shantou) with a diverse range of agricultural systems, using GaoFen-2 (GF-2) images. The cropland extraction results for the study areas had an overall accuracy (OA) ranging from 97.00% to 98.33%, with F1 scores (F1) of 0.830–0.940 and Kappa coefficients (Kappa) of 0.814–0.929. The OA was 97.85%, F1 was 0.915, and Kappa was 0.901 over all study areas. Moreover, our proposed method demonstrated advantages compared to machine learning methods (e.g., RF) and previous semantic segmentation models, such as U-Net, U-Net++, U-Net3+, and MPSPNet. The results demonstrated the generalization ability and reliability of the proposed method for cropland extraction in southern China using VHR remote images.

MATNet: multiattention Transformer network for cropland semantic segmentation in remote sensing images

CCTNet: Coupled CNN and Transformer Network for Crop Segmentation of Remote Sensing Images.

Cropland Extraction in Southern China from Very High-Resolution Images Based on Deep Learning

MLGNet: Multi-Task Learning Network with Attention-Guided Mechanism for Segmenting Agricultural Fields

Multiscale Edge-Guided Network for Accurate Cultivated Land Parcel Boundary Extraction From Remote Sensing Images

A Multi-Scale Feature Fusion Deep Learning Network for the Extraction of Cropland Based on Landsat Data

Multi-Swin Mask Transformer for Instance Segmentation of Agricultural Field Extraction

MAENet: Multiple Attention Encoder–Decoder Network for Farmland Segmentation of Remote Sensing Images

S-MAT: Semantic-Driven Masked Attention Transformer for Multi-Label Aerial Image Classification.

TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images

Multi-attention semantic segmentation method for forest information extraction in hilly and mountainous areas

A image fusion and U-Net approach to improving crop planting structure multi-category classification in irrigated area

Multi-Attention-Based Semantic Segmentation Network for Land Cover Remote Sensing Images

MSCPUnet: A Multi-task Neural Network for Plot-level Crop Classification in Complex Agricultural Areas

CLANET: a cross-linear attention network for semantic segmentation of urban scenes remote sensing images

Crop classification in high-resolution remote sensing images based on multi-scale feature fusion semantic segmentation model

Deep Fusion of Spectral–Spatial Priors for Cropland Segmentation in Remote Sensing Images

BAFormer: A Novel Boundary-Aware Compensation UNet-like Transformer for High-Resolution Cropland Extraction

DSHANet: dynamic sparse hierarchical attention-driven cropland change detection network with holistic complementation fusion

MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images

CTMFNet: CNN and Transformer Multiscale Fusion Network of Remote Sensing Urban Scene Imagery