Importance-aware Shared Parameter Subspace Learning for Domain Incremental Learning

Shiye Wang,Changsheng Li,Jialin Tang,Xing Gong,Ye Yuan,Guoren Wang
DOI: https://doi.org/10.1145/3664647.3681411
2024-01-01
Abstract:Parameter-Efficient-Tuning (PET) for pre-trained deep models (e.g., transformer) hold significant potential for domain increment learning (DIL). Recent prevailing approaches resort to prompt learning, which typically involves learning a small number of prompts for each domain to avoid the issue of catastrophic forgetting. However, previous studies have pointed out prompt-based methods are often challenging to optimize, and their performance may vary non-monotonically with trainable parameters. In contrast to previous prompt-based DIL methods, we put forward an importance-aware shared parameter subspace learning for domain incremental learning, on the basis of low-rank adaption (LoRA). Specifically, we propose to incrementally learn a domain-specific and domain-shared low-rank parameter subspace for each domain, in order to effectively decouple the parameter space and capture shared information across different domains. Meanwhile, we present a momentum update strategy for learning the domain-shared subspace, allowing for the smoothly accumulation of knowledge in the current domain while mitigating the risk of forgetting the knowledge acquired from previous domains. Moreover, given that domain-shared information might hold varying degrees of importance across different domains, we design an importance-aware mechanism that adaptively assigns an importance weight to the domain-shared subspace for the corresponding domain. Finally, we devise a cross-domain contrastive constraint to encourage domain-specific subspaces to capture distinctive information within each domain effectively, and enforce orthogonality between domain-shared and domain-specific subspaces to minimize interference between them. Extensive experiments on image domain incremental datasets demonstrate the effectiveness of the proposed method in comparison to the related state-of-the-art methods.
What problem does this paper attempt to address?