Relation-Specific Feature Augmentation for unbiased scene graph generation

Zhihong Liu,Jianji Wang,Hui Chen,Yongqiang Ma,Nanning Zheng
DOI: https://doi.org/10.1016/j.patcog.2024.110936
IF: 8
2024-09-03
Pattern Recognition
Abstract:Scene Graph Generation (SGG) models suffer from the long-tailed distribution of relations, which results in biased predictions that favor head relations ( e.g. , on ) over informative tail ones ( e.g. , sitting on , laying on , standing on ). Existing solutions typically adopt class re-balancing strategies to balance data distribution. However, they do not essentially solve the lack of information due to insufficient tail data. To this end, we propose a Relation-Specific Feature Augmentation (RSFA) framework to mitigate the long-tailed bias by augmenting relations in the feature space. To perform augmentation effectively, we design an augmentation scheme and a novel Dual Attention Network (DAN). The augmentation scheme augments each relation uniformly based on the reciprocal number of samples to avoid over-fitting. By extracting relation-specific information from new object features generated by a Conditional Variational AutoEncoder (CVAE), DAN generates reliable virtual relation representations to provide useful information to guide optimizing relation classifier. Extensive ablation studies and comprehensive analysis demonstrate the effectiveness of our method in debiasing. And results on the Visual Genome benchmark show that our method significantly outperforms the existing state-of-the-art methods.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?