Unsupervised Deep Hyperspectral Video Target Tracking and High Spectral-Spatial-Temporal Resolution (H³ Benchmark Dataset

Zhenqi Liu,Yanfei Zhong,Xinyu Wang,Meng Shu,Liangpei Zhang
DOI: https://doi.org/10.1109/tgrs.2021.3111183
IF: 8.2
2022-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Target tracking has received increased attention in the past few decades. However, most of the target tracking algorithms are based on RGB video data, and few are based on hyperspectral video data. With the development of the new “snapshot” hyperspectral sensors, hyperspectral videos can now be easily obtained. However, hyperspectral video target tracking datasets are still rare. In this article, a high spectral-spatial-temporal resolution hyperspectral video target tracking algorithm framework (H 3 Net) based on deep learning is proposed. The proposed framework consists of two main parts: 1) an unsupervised deep learning-based target tracking training framework for hyperspectral video; and 2) a dual-branch network structure based on a Siamese network. Using the dual-branch network, the H 3 Net framework can utilize both the spatial and spectral information. The combination of deep learning and a discriminative correlation filter (DCF) makes the features extracted by deep learning more suitable for the DCF. Compared with hyperspectral images, hyperspectral video data require more manpower to annotate, so we propose an unsupervised approach to train H 3 Net, without any annotation. To solve the problem of the lack of hyperspectral video datasets, we built a 25-band hyperspectral video dataset (the high spectral-spatial-temporal resolution hyperspectral video dataset: the WHU-Hi-H 3 dataset) for target tracking. The experimental results obtained with the WHU-Hi-H 3 dataset confirm the potential of unsupervised deep learning in hyperspectral video target tracking.
What problem does this paper attempt to address?