L2A: Learning Affinity from Attention for Weakly Supervised Continual Semantic Segmentation
Hao Liu,Yong Zhou,Bing Liu,Ming Yan,Joey Tianyi Zhou
DOI: https://doi.org/10.1109/tcsvt.2024.3462946
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Despite significant advances in continual semantic segmentation ( CSS ), they still rely on the pixel-level annotation to train models, which is time-consuming and labor-intensive. Continual learning from image-level labels is an emerging scheme in continual semantic segmentation to reduce the annotation cost. However, the incomplete and coarse pseudo-labels are insufficient to train a model to maintain a balance between stability and plasticity. To solve these issues, we propose a novel end-to-end framework based on Transformer, called L2A, for Weakly Supervised Continual Semantic Segmentation ( WSCSS ). In particular, to generate reliable annotations from the image-level supervision, we introduce a semantic affinity from multi-head self-attention (SA-MHSA) module to capture the semantic relationships among adjacent image coordinates. Subsequently, this acquired semantic affinity is employed to refine the initial pseudo labels of new classes trained with the image-level annotations. Furthermore, to minimize catastrophic forgetting, we propose a semantic drift compensation (SDC) strategy to optimize the pseudo-label generation process, which can effectively improve the alignment of object boundaries across both new and old categories. Comprehensive experiments conducted on the PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our framework in existing WSCSS scenarios and a newly proposed challenge protocol, as well as remains competitive compared to the pixel-level supervised CSS methods.