Task-Agnostic Federated Learning

Zhengtao Yao,Hong Nguyen,Ajitesh Srivastava,Jose Luis Ambite
2024-06-25
Abstract:In the realm of medical imaging, leveraging large-scale datasets from various institutions is crucial for developing precise deep learning models, yet privacy concerns frequently impede data sharing. federated learning (FL) emerges as a prominent solution for preserving privacy while facilitating collaborative learning. However, its application in real-world scenarios faces several obstacles, such as task & data heterogeneity, label scarcity, non-identically distributed (non-IID) data, computational vaiation, etc. In real-world, medical institutions may not want to disclose their tasks to FL server and generalization challenge of out-of-network institutions with un-seen task want to join the on-going federated system. This study address task-agnostic and generalization problem on un-seen tasks by adapting self-supervised FL framework. Utilizing Vision Transformer (ViT) as consensus feature encoder for self-supervised pre-training, no initial labels required, the framework enabling effective representation learning across diverse datasets and tasks. Our extensive evaluations, using various real-world non-IID medical imaging datasets, validate our approach's efficacy, retaining 90\% of F1 accuracy with only 5\% of the training data typically required for centralized approaches and exhibiting superior adaptability to out-of-distribution task. The result indicate that federated learning architecture can be a potential approach toward multi-task foundation modeling.
Computer Vision and Pattern Recognition,Artificial Intelligence,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to handle task - agnostic and imbalanced data issues in Federated Learning (FL), especially in the field of medical imaging. Specifically, researchers are faced with the following challenges: 1. **Privacy Protection and Data Sharing**: In the field of medical imaging, institutions possess a large amount of data, but for privacy reasons, this data cannot be directly shared. Federated learning, as a solution, enables collaborative learning without sharing data, but its application still faces multiple obstacles. 2. **Task Heterogeneity and Data Heterogeneity**: The tasks and data distributions of different institutions vary greatly, leading to problems such as label scarcity and non - IID (non - independent and identically distributed) data during model training. In addition, newly - joined institutions may bring unseen tasks, which pose a challenge to the generalization ability of existing federated learning systems. 3. **Task - Agnosticism**: In practical applications, institutions may be unwilling to disclose their specific tasks to the federated server, making traditional federated learning methods difficult to apply. To solve these problems, this paper proposes a federated learning framework based on Self - Supervised Learning (SSL). By using Vision Transformer (ViT) as a consensus feature encoder for self - supervised pre - training, this framework can effectively learn representations across diverse datasets and tasks without the need for initial labels. The main contributions include: - **Task - Agnostic Federated Learning Framework**: A new problem setting, namely task - agnosticism between clients and servers, is proposed, and this challenge is solved through self - supervised pre - training. - **Efficient Knowledge Transfer**: The pre - trained global encoder can be quickly adapted to various downstream tasks through fine - tuning, and high performance can be achieved by only fine - tuning about 2% of the pre - trained model parameters. - **Robustness to OOD (Out - of - Distribution) Data**: Experiments show that this framework retains 90% of the F1 accuracy of centralized training in classification tasks and outperforms the centralized model in segmentation tasks, especially with a significant improvement for few - shot tasks. Through the above methods, this paper demonstrates the potential of the federated learning architecture in multi - task - based modeling, especially in the application of the medical imaging field.