SelfFed: Self-supervised Federated Learning for Data Heterogeneity and Label Scarcity in IoMT

Sunder Ali Khowaja,Kapal Dev,Syed Muhammad Anwar,Marius George Linguraru
DOI: https://doi.org/10.1016/j.eswa.2024.125493
2024-10-10
Abstract:Self-supervised learning in federated learning paradigm has been gaining a lot of interest both in industry and research due to the collaborative learning capability on unlabeled yet isolated data. However, self-supervised based federated learning strategies suffer from performance degradation due to label scarcity and diverse data distributions, i.e., data heterogeneity. In this paper, we propose the SelfFed framework for Internet of Medical Things (IoMT). Our proposed SelfFed framework works in two phases. The first phase is the pre-training paradigm that performs augmentive modeling using Swin Transformer based encoder in a decentralized manner. The first phase of SelfFed framework helps to overcome the data heterogeneity issue. The second phase is the fine-tuning paradigm that introduces contrastive network and a novel aggregation strategy that is trained on limited labeled data for a target task in a decentralized manner. This fine-tuning stage overcomes the label scarcity problem. We perform our experimental analysis on publicly available medical imaging datasets and show that our proposed SelfFed framework performs better when compared to existing baselines concerning non-independent and identically distributed (IID) data and label scarcity. Our method achieves a maximum improvement of 8.8% and 4.1% on Retina and COVID-FL datasets on non-IID dataset. Further, our proposed method outperforms existing baselines even when trained on a few (10%) labeled instances.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to address two major issues in the field of medical imaging: data heterogeneity and label scarcity. Specifically, the paper proposes a self-supervised federated learning framework named SelfFed to tackle the problems of inconsistent data distribution (data heterogeneity) and insufficient label quantity (label scarcity) in medical images. This is achieved through the following methods: 1. **Data Heterogeneity**: By utilizing a Swin Transformer encoder during the pre-training phase for enhanced modeling, overcoming the issue of data heterogeneity in a decentralized manner. 2. **Label Scarcity**: By introducing a contrastive network and a novel aggregation strategy during the fine-tuning phase to address the problem of label scarcity. Experimental analysis shows that SelfFed outperforms existing baseline methods on publicly available medical imaging datasets and achieves better performance even with a small amount (10%) of labeled instances. This indicates that the method can significantly improve accuracy on non-independent and identically distributed (non-IID) datasets, particularly with improvements of 8.8% and 4.1% on the Retina and COVID-FL datasets, respectively.