SWHFormer: A Vision Transformer for Significant Wave Height Estimation From Nautical Radar Images
Zhiding Yang,Weimin Huang
DOI: https://doi.org/10.1109/tgrs.2024.3376471
IF: 8.2
2024-03-26
IEEE Transactions on Geoscience and Remote Sensing
Abstract:This article presented a novel significant wave height (SWH) estimation method, SWHFormer, which incorporates the Vision Transformer (ViT) to estimate SWH from X-band nautical radar images. Unlike traditional convolutional neural networks (CNNs), the ViT model treats the input as a sequence, capitalizing on its attention mechanism to capture long-range dependencies, resulting in superior performance in capturing the complex patterns present in sea wave dynamics. The radar data undergo an image denoising routine, followed by patching, flattening, and embedding processes to form a sequence fed into the transformer encoding module. The outputs from the encoder are then aggregated to derive the final regression result, i.e., SWH estimation. To evaluate the performance of SWHFormer, the dataset collected by a Decca radar aboard a free-navigating vessel is analyzed, and both buoy and model-based data are used as ground truth. In this study, two traditional linear fitting methods, i.e., ensemble empirical mode decomposition (EEMD) and variational mode decomposition (VMD)-based approaches, and a recent deep learning algorithm, convolutional gated recurrent unit (CGRU) network, are exploited for comparison with SWHFormer. It is found that the root mean square error (RMSE) of the estimated results using the proposed SWHFormer is decreased from 0.29, 0.26, and 0.18 m to 0.16 m after the temporal moving average, respectively, compared with the above three methods, when the buoy-measured SWH is served as ground truth. Besides, it is decreased from 0.30, 0.28, and 0.16 m to 0.14 m, respectively, when the model-based SWH is used as reference.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics