SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network

Zhaoxu Li,Wei An,Gaowei Guo,Longguang Wang,Yingqian Wang,Zaiping Lin
2024-05-16
Abstract:Hyperspectral target detection (HTD) aims to identify specific materials based on spectral information in hyperspectral imagery and can detect point targets, some of which occupy a smaller than one-pixel area. However, existing HTD methods are developed based on per-pixel binary classification, which limits the feature representation capability for point targets. In this paper, we rethink the hyperspectral point target detection from the object detection perspective, and focus more on the object-level prediction capability rather than the pixel classification capability. Inspired by the token-based processing flow of Detection Transformer (DETR), we propose the first specialized network for hyperspectral multi-class point object detection, SpecDETR. Without the backbone part of the current object detection framework, SpecDETR treats the spectral features of each pixel in hyperspectral images as a token and utilizes a multi-layer Transformer encoder with local and global coordination attention modules to extract deep spatial-spectral joint features. SpecDETR regards point object detection as a one-to-many set prediction problem, thereby achieving a concise and efficient DETR decoder that surpasses the current state-of-the-art DETR decoder in terms of parameters and accuracy in point object detection. We develop a simulated hyperSpectral Point Object Detection benchmark termed SPOD, and for the first time, evaluate and compare the performance of current object detection networks and HTD methods on hyperspectral multi-class point object detection. SpecDETR demonstrates superior performance as compared to current object detection networks and HTD methods on the SPOD dataset. Additionally, we validate on a public HTD dataset that by using data simulation instead of manual annotation, SpecDETR can detect real-world single-spectral point objects directly.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper focuses on the problem of hyperspectral point object detection. Existing methods for hyperspectral object detection are based on pixel binary classification, which limits the representation capability for point object features. The researchers re-examined this problem and approached it from the perspective of object detection, with a focus on object-level prediction rather than pixel-level classification. Inspired by the Detection Transformer (DETR), they proposed the first network specifically designed for hyperspectral multi-class point object detection called SpecDETR. SpecDETR does not use the backbone network in traditional object detection frameworks. Instead, it treats the spectral features of each pixel in the hyperspectral image as a token and extracts joint deep spatial-spectral features through a multi-layer Transformer encoder. It treats point object detection as a one-to-one-to-many set prediction problem and achieves a concise and efficient DETR decoder that surpasses the current DETR decoder in terms of parameters and accuracy. To evaluate and compare the performance of current object detection networks and HTD methods in hyperspectral multi-class point object detection, they constructed a benchmark called SPOD. On the SPOD dataset, SpecDETR demonstrates superior performance compared to current object detection networks and HTD methods, particularly in detecting sub-pixel objects with extremely low spectral abundances. Moreover, by using data simulation instead of manual annotation, SpecDETR can directly detect real-world single-spectral point objects. In summary, this paper addresses the issue of insufficient feature representation for small targets in existing hyperspectral point object detection methods. It proposes a new Transformer-based detection network, improving the detection capability for sub-pixel and small targets.