Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning

Chong Yu,Shuaiqi Shen,Shiqiang Wang,Kuan Zhang,Hai Zhao
2024-04-16
Abstract:E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis. By allowing multiple devices to train models collaboratively, federated learning is a promising solution to address the communication and privacy issues in e-health. However, applying federated learning in e-health faces many challenges. First, medical data is both horizontally and vertically partitioned. Since single Horizontal Federated Learning (HFL) or Vertical Federated Learning (VFL) techniques cannot deal with both types of data partitioning, directly applying them may consume excessive communication cost due to transmitting a part of raw data when requiring high modeling accuracy. Second, a naive combination of HFL and VFL has limitations including low training efficiency, unsound convergence analysis, and lack of parameter tuning strategies. In this paper, we provide a thorough study on an effective integration of HFL and VFL, to achieve communication efficiency and overcome the above limitations when data is both horizontally and vertically partitioned. Specifically, we propose a hybrid federated learning framework with one intermediate result exchange and two aggregation phases. Based on this framework, we develop a Hybrid Stochastic Gradient Descent (HSGD) algorithm to train models. Then, we theoretically analyze the convergence upper bound of the proposed algorithm. Using the convergence results, we design adaptive strategies to adjust the training parameters and shrink the size of transmitted data. Experimental results validate that the proposed HSGD algorithm can achieve the desired accuracy while reducing communication cost, and they also verify the effectiveness of the adaptive strategies.
Machine Learning,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the low communication efficiency and privacy protection issues in e - health due to horizontal and vertical data partitioning. Specifically, the challenges mentioned in the paper include: 1. **Horizontal and vertical data partitioning**: Medical data has both horizontal partitioning (samples are different but features are the same among different devices or hospitals) and vertical partitioning (the data of the same patient is distributed among different devices and hospitals). Using horizontal federated learning (HFL) or vertical federated learning (VFL) alone cannot effectively handle these two types of partitioning, and direct application will lead to excessive communication costs. 2. **Limitations of existing methods**: - The method of simply combining HFL and VFL has deficiencies in training efficiency, convergence analysis, and parameter adjustment strategies. - The interaction between HFL and VFL complicates the theoretical convergence proof and lacks the convergence results required to optimize the training process. - The lag of intermediate results affects the modeling efficiency, especially in the global multi - classification method. - Training efficiency is affected by multiple configurations, such as global aggregation interval, local communication frequency, and learning rate, but the specific impact of these parameters on model performance is still unclear. To solve these problems, the author proposes a new hybrid federated learning framework, aiming to improve communication efficiency and overcome the above limitations. This framework includes an intermediate result exchange stage and two aggregation stages (local aggregation and global aggregation), and a hybrid stochastic gradient descent (HSGD) algorithm is developed to train the model. In addition, the author also conducts theoretical analysis, derives the convergence conditions of the algorithm, and designs an adaptive strategy to adjust training parameters, thereby reducing communication costs and improving accuracy.