Gradient-Guided Channel Masking for Cross-Domain Few-Shot Learning

Siqi Hui,Sanping Zhou,Ye Deng,Yang Wu,Jinjun Wang
DOI: https://doi.org/10.1016/j.knosys.2024.112548
IF: 8.139
2024-01-01
Knowledge-Based Systems
Abstract:Cross-Domain Few-Shot Learning (CD-FSL) addresses the Few-Shot Learning with a domain gap between source and target domains, which facilitates the transfer of knowledge from a source domain to a target domain with limited labeled samples. Current approaches often incorporate an auxiliary target dataset containing a few labeled samples to enhance model generalization on specific target domains. However, we observe that many models retain a substantial number of channels that learn source-specific knowledge and extract features that perform adequately on the source domain but generalize poorly to the target domain. This often results in compromised performance due to the influence of source-specific knowledge. To address this challenge, we introduce a novel framework, Gradient-Guided Channel Masking (GGCM), designed for CD-FSL to mitigate model channels from acquiring too much source-specific knowledge. GGCM quantifies each channel's contribution to solving target tasks using gradients of target loss and identifies those with smaller gradients as source-specific. These channels are then masked during the forward propagation of source features to mitigate the learning of source-specific knowledge. Conversely, GGCM mutes non-source-specific channels during the forward propagation of target features, forcing the model to depend on the source-specific channels and thereby enhancing their generalizability. Moreover, we propose a consistency loss that aligns the predictions made by source-specific channels with those made by the entire model. This approach further enhances the generalizability of these channels by enabling them to learn from the generalizable knowledge contained in other non-source-specific channels. Validated across multiple CD-FSL benchmark datasets, our framework demonstrates state-of-the-art performance and effectively suppresses the learning of source-specific knowledge.
What problem does this paper attempt to address?