A Temporal-Spatial Deep Learning Network for Winter Wheat Mapping Using Time-Series Sentinel-2 Imagery

Lingling Fan,Lang Xia,Jing Yang,Xiao Sun,Shangrong Wu,Bingwen Qiu,Jin Chen,Wenbin Wu,Peng Yang
DOI: https://doi.org/10.1016/j.isprsjprs.2024.06.005
IF: 12.7
2024-01-01
ISPRS Journal of Photogrammetry and Remote Sensing
Abstract:Accurate mapping of winter wheat provides essential information for food security and ecosystem protection. Deep learning approaches have achieved promising crop discrimination performance based on multitemporal satellite imagery. However, due to the high dimensionality of the data, sequential relations, and complex semantic information in time-series imagery, effective methods that can automatically capture temporal-spatial features with high separability and generalizability have received less attention. In this study, we proposed a U-shaped CNN-Transformer hybrid framework based on an attention mechanism, named the U-Temporal-Spatial-Transformer network (UTS-Former), for winter wheat mapping using Sentinel-2 imagery. This model includes an “encoder-decoder” structure for multiscale information mining of time series images and a temporal-spatial transformer module (TST) for learning comprehensive temporal sequence features and spatial semantic information. The results obtained from two study areas indicated that our UTS-Former achieved the best accuracy, with a mean MCC of 0.928 and an F1-score of 0.950, and the results of different band combinations also showed better performance than other popular time-series methods. We found that the MCC (MCC/All) of the UTS-Former using only RGB bands decreased by 4.53 %, while it decreased by 13.36 % and 35.02 % for UNet2d-LSTM and CNN-BiLSTM, respectively, compared with that of all the band combinations. The comparison demonstrated that the proposed UTS-Former could capture more global temporal-spatial information from winter wheat fields and achieve greater precision in terms of local details than other methods, resulting in high-quality mapping. The analysis of attention scores for the available acquisition dates revealed significant contributions of both beginning and ending growth images in winter wheat mapping, which is valuable for making appropriate selections of image dates. These findings suggest that the proposed approach has great potential for accurate, cost-effective, and high-quality winter wheat mapping.
What problem does this paper attempt to address?