Abstract:Deep Neural Networks (DNNs) have been ubiquitously adopted in internet of things and are becoming an integral part of our daily life. When tackling the evolving learning tasks in real world, such as classifying different types of objects, DNNs face the challenge to continually retrain themselves according to the tasks on different edge devices. Federated continual learning (FCL) is a promising technique that offers partial solutions but yet to overcome the following difficulties: the significant accuracy loss due to the limited on-device processing, the negative knowledge transfer caused by the limited communication of non-IID (non-Independent and Identically Distributed) data, and the limited scalability on the tasks and edge devices. Moreover, existing FCL techniques are designed for convolutional neural networks (CNNs), which have not utilized the full potential of newly emerged powerful vision transformers (ViTs). Considering ViTs depend heavily on training data diversity and volume, we hypothesize ViTs are well-suited for FCL where data arrives continually. In this paper, we propose FedViT, an accurate and scalable federated continual learning framework for ViT models, via a novel concept of signature task knowledge. FedViT is a client-side solution that continuously extracts and integrates the knowledge of signature tasks which are highly influenced by the current task. Each client of FedViT is composed of a knowledge extractor, a gradient restorer and, most importantly, a gradient integrator. Upon training for a new task, the gradient integrator ensures the prevention of catastrophic forgetting and mitigation of negative knowledge transfer by effectively combining signature tasks identified from the past local tasks and other clients’ current tasks through the global model. We implement FedViT in PyTorch and extensively evaluate it against state-of-the-art techniques using popular federated continual learning benchmarks. Extensive evaluation results on heterogeneous edge devices show that FedViT improves model accuracy by 88.61% without increasing model training time, reduces communication cost by 61.55%, and achieves more improvements under difficult scenarios such as large numbers of tasks or clients, and training different complex ViT models.

An Empirical Analysis of Vision Transformer and CNN in Resource-Constrained Federated Learning

Multi-Dimension Compression of Feed-Forward Network in Vision Transformers

EFTViT: Efficient Federated Training of Vision Transformers with Masked Images on Resource-Constrained Edge Devices

OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning

FedViT: Federated Continual Learning of Vision Transformer at Edge

Training Vision Transformers with only 2040 Images.

Lightweight Vision Transformer with Cross Feature Attention

Federated Learning Approach for Remote Sensing Scene Classification

FMViT: A multiple-frequency mixing Vision Transformer

Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images

LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition

CF-ViT: A General Coarse-to-Fine Method for Vision Transformer

CSFNet: a compact and efficient convolution-transformer hybrid vision model

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices.

Performance Analysis for Resource Constrained Decentralized Federated Learning Over Wireless Networks

Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning

Super Vision Transformer

TFormer: A Transmission-Friendly ViT Model for IoT Devices

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

DSCAFormer: Lightweight Vision Transformer With Dual-Branch Spatial Channel Aggregation

FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer