Federated brain tumor segmentation: an extensive benchmark

Matthis Manthe,Stefan Duffner,Carole Lartizien
DOI: https://doi.org/10.1016/j.media.2024.103270
2024-10-07
Abstract:Recently, federated learning has raised increasing interest in the medical image analysis field due to its ability to aggregate multi-center data with privacy-preserving properties. A large amount of federated training schemes have been published, which we categorize into global (one final model), personalized (one model per institution) or hybrid (one model per cluster of institutions) methods. However, their applicability on the recently published Federated Brain Tumor Segmentation 2022 dataset has not been explored yet. We propose an extensive benchmark of federated learning algorithms from all three classes on this task. While standard FedAvg already performs very well, we show that some methods from each category can bring a slight performance improvement and potentially limit the final model(s) bias toward the predominant data distribution of the federation. Moreover, we provide a deeper understanding of the behaviour of federated learning on this task through alternative ways of distributing the pooled dataset among institutions, namely an Independent and Identical Distributed (IID) setup, and a limited data setup.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate and compare the performance of different types of federated learning algorithms on the brain tumor segmentation task, especially on the Federated Brain Tumor Segmentation 2022 (FeTS2022) dataset. Specifically, the research aims to: 1. **Benchmarking**: Conduct a comprehensive benchmark test on global, personalized, and hybrid federated learning methods to evaluate their performance on the brain tumor segmentation task. 2. **Performance improvement**: Explore whether these methods can further improve performance on the basis of standard Federated Averaging (FedAvg) and reduce model bias. 3. **Impact of data distribution**: Analyze the impact of different data distribution methods (such as independent and identically distributed (IID) and limited data settings) on the performance of federated learning algorithms. 4. **Understanding behavior**: Through experiments, gain in - depth understanding of the behavioral characteristics of federated learning in this specific task. ### Research background With the increasing demand for privacy protection in the field of medical image analysis, federated learning, as a method that can jointly train models without sharing raw data, has received extensive attention. However, existing federated learning methods face problems such as data distribution heterogeneity in practical applications, which may affect the performance of the model. Therefore, this paper hopes to provide valuable references for future research through systematic benchmarking. ### Main contributions - **First - time benchmarking**: For the brain tumor segmentation task, personalized and clustering federated learning methods are benchmarked for the first time, demonstrating the potential of these methods in this task. - **Time budget setting**: Set a common time budget so that the fastest algorithm can reach the verification plateau, thus being closer to the convergence result. - **Inter - institutional performance bias analysis**: Analyze the inter - institutional performance bias brought by each federated optimizer. - **Unified framework definition**: Define each method under a unified formal framework and provide publicly available implementation code. ### Experimental design To achieve the above goals, the author selected a variety of representative federated learning algorithms for experiments, including but not limited to: - **Global methods**: Such as FedAvg, FedAdam, SCAFFOLD, FedNova, and q - FedAvg. - **Personalized methods**: Such as Ditto, FedPer, and LG - FedAvg. - **Hybrid methods**: Such as FedPIDAvg. In addition, the author also explored different data distribution methods to better understand the behavior of federated learning under different conditions. ### Conclusion Through extensive experiments on the FeTS2022 dataset, the author found that although the standard FedAvg has already performed well, some personalized and hybrid methods can bring a slight performance improvement in specific cases and help reduce the model's bias towards the dominant data distribution. These results provide an important reference for future federated learning research.