TATM: Task-Adaptive Token Matching for Few-Shot Transformer

Yuheng Li,Fanzhang Li
DOI: https://doi.org/10.1109/ijcnn60899.2024.10649933
2024-01-01
Abstract:The potential of transformer in addressing few-shot learning problems remains largely untapped, primarily due to the current challenges in maintaining a robust inductive bias. This limitation contributes to the performance degradation observed in few-shot tasks. To alleviate this issue, we propose a two-stage few-shot learning framework based on the Vision Transformer, named Task-Adaptive Token Matching (TATM). Specifically, our approach utilizes advanced masked image modeling during pretraining to obtain discriminative representations of samples and capture long-range semantic correspondences. Following that, we empower the model with adaptive capabilities tailored to various tasks, employing a dual-loop meta-finetuning paradigm. Our key innovation lies in establishing finer-grained dependencies among image patches within the task, enabling the model to identify the most meaningful patch for the current task. This enhancement strengthens the transformer architecture’s locality and translation equivariance, introducing a degree of inductive bias that motivates the model’s resistance to overfitting. Furthermore, inspired by contrastive learning, we incorporate a penalty term into the few-shot classification loss function. This term is employed to balance the relationship between intra-class and inter-class variations. Our experimental evaluations on three mainstream few-shot benchmarks under 5-way 5-shot and 1-shot settings demonstrate that the TATM significantly improves the classification performance of few-shot tasks.
What problem does this paper attempt to address?