Text-Guided Diverse Image Synthesis for Long-Tailed Remote Sensing Object Classification
Haojun Tang,Wenda Zhao,Guang Hu,Yi Xiao,Yunlong Li,Haipeng Wang
DOI: https://doi.org/10.1109/tgrs.2024.3422095
IF: 8.2
2024-07-13
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Remote sensing datasets pose long-tailed data distribution, and such unbalanced datasets will reduce the performance of existing remote sensing object classification models. Existing methods mainly rely on resampling datasets, modifying loss functions, data augmentation, and transfer learning to cope with such challenges. Unlike these, our study takes a novel perspective and focuses on mitigating the long-tailed distribution problem by generating a large number of tail-class images with consistency and diversity. Specifically, this article introduces a novel text-guided tail-class generation network (TGN). TGN comprises two main components: knowledge mutual distillation network (KMDN) and class-consistent diverse tail-class generation network (CDTG). KMDN resolves the isolation issue of the head and tail knowledge by facilitating mutual learning of feature representations between the head and tail data, thereby improving the feature extraction capability of the tail model. CDTG focuses on generating class-consistency diverse tail-class images that uses tail-class features extracted by KMDN. Especially, the class consistency is guaranteed by contrastive language-image pre-trainings (CLIP's) powerful text-image alignment capability. These generated images are then added back into the original dataset to alleviate the long-tailed distribution, thereby improving the tail-class accuracy. Extensive experiments on the widely used DIOR, FGSC-23 and DOTA datasets demonstrate that the proposed method outperforms state-of-the-art methods. Dataset and code are publicly available at https://github.com/XinR-Tang/TGN.
engineering, electrical & electronic,imaging science & photographic technology,remote sensing,geochemistry & geophysics