FPNFormer: Rethink the Method of Processing the Rotation-Invariance and Rotation-Equivariance on Arbitrary-Oriented Object Detection
Yang Tian,Mengmeng Zhang,Jinyu Li,Yangfan Li,Hong Yang,Wei Li
DOI: https://doi.org/10.1109/tgrs.2024.3351156
IF: 8.2
2024-02-02
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Feature pyramid network transformer decoder (FPNFormer) module, which can effectively deal with the strong rotation arbitrary of remote sensing images while improving the expressiveness and robustness of the model. It is a plug-and-play module that can be well transferred to various detection models and significantly improves performance. Specifically, we use the computational method of transformer decoder to deal with the problem that the image has any orientation, and its output weakly depends on the order of the input data. We apply it to the feature fusion stage and design two ways top-down and down-top to fuse features of different scales, which enables the model to have a more vital ability to perceive objects at different scales and angles. Experiments on commonly used benchmarks (DOTA1.0, DOTA1.5, SSDD, and RSDD) demonstrate that the proposed FPNFormer module significantly improves the performance of multiple arbitrary-oriented object detectors, such as 1.99% map improvement of rotated retinanet on DOTA's cross-validation set. On RSDD datasets, the baseline model using FPNFormer improves the map of large objects by 5.1%. Combined with more competitive models, the proposed method can achieve a 79.39% map on the DOTA1.0 dataset. The code is available at https://github.com/bityangtian/FPNFormer.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics