A Novel End-to-End Transformer for Scene Graph Generation

Chengkai Ren,Xiuhua Liu,Mengyuan Cao,Jian Zhang,Hongwei Wang
DOI: https://doi.org/10.1109/IJCNN54540.2023.10191798
2023-01-01
Abstract:An image usually contains not only visual information but also higher-level semantic information. Nevertheless, previous computer vision algorithms, such as target detection and image classification, use only the visual features of the image alone. Recently, the explosion of scene graphs in computer vision has led to the challenge of generating structured scene graphs with rich semantic information. This paper proposes a one-stage query-based end-to-end Transformer model and generates scene graphs using the Hungarian matching algorithm. We develop an anti-bias reasoner module to reduce the impact of the unbalanced data distribution. Time-division training strategy is proposed to improve model training efficiency and speed up model convergence while improving model training performance. Experiments on the large-scale dataset Visual Genome were conducted in order to confirm the validity of our method. Compared with the existing state-of-the-art method, our method guarantees inference speed while maintaining acceptable performance and is more suitable for tasks with high real-time performance. Our work demonstrates that the one-stage method has great potential for exploration in scene graph generation.
What problem does this paper attempt to address?