Supplementary Material: Continual Learning with Lifelong Vision Transformer

Zhen Wang,Liu Liu,Yiqun Duan,Yajing Kong,Dacheng Tao
2022-01-01
Abstract:A provides detailed experimental settings and implementation of LVT. In Appendix B, we present empirical results of Backward Transfer and discuss the limitation of vision transformers for continual learning. pyramid architectures used in CNNs to built transformer block. Between the LVT stages, we perform a downsampling (Shrink) to reduce the resolution of the activation maps and increase their number of channels between LVT stages. For CIFAR100, two LVT stages; For ImageNet, we use three LVT stages. Lifelong transformer block includes a residual structure with proposed inter-task attention and feed forward. We use 1 × 1 convolution to control the the number of channel, followed by the batch normalization. The implementation of transformer block is based on ViT [21] and LeViT [25]. LVT uses GELU activation and dropout in transformer blocks and applies a global average pooling to the last activation map.
What problem does this paper attempt to address?