Abstract:Federated Learning (FL) has been an emerging trend in machine learning and artificial intelligence. It allows multiple participants to collaboratively train a better global model and offers a privacy-aware paradigm for model training since it does not require participants to release their original training data. However, existing FL solutions for vertically partitioned data or decision trees require heavy cryptographic operations. In this article, we propose a framework named <inline-formula><tex-math notation="LaTeX">$\mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq5-3276365.gif"/></alternatives></inline-formula> for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both vertically and horizontally partitioned data. Vertical <inline-formula><tex-math notation="LaTeX">$\mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq6-3276365.gif"/></alternatives></inline-formula> does <italic>not</italic> require any cryptographic operation and horizontal <inline-formula><tex-math notation="LaTeX">$\mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq7-3276365.gif"/></alternatives></inline-formula> only requires lightweight secure aggregation. The key observation is that the whole training process of GBDT relies on the <italic>ordering</italic> of the data instead of the values. We fully implement <inline-formula><tex-math notation="LaTeX">$\mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq8-3276365.gif"/></alternatives></inline-formula> and evaluate its utility and efficiency through extensive experiments performed on three public datasets. Our experimental results show that both vertical and horizontal <inline-formula><tex-math notation="LaTeX">$\mathsf {FederBoost}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">FederBoost</mml:mi></mml:math><inline-graphic xlink:href="liu-ieq9-3276365.gif"/></alternatives></inline-formula> achieve the same level of accuracy with centralized training where all data are collected in a central server; and they are 4-5 orders of magnitude faster than the state-of-the-art solutions for federated decision tree training; hence offering practical solutions for industrial applications.

${\sf FederBoost}$: Private Federated Learning for GBDT

FederBoost: Private Federated Learning for GBDT

SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

VF2Boost: Very Fast Vertical Federated Gradient Boosting for Cross-Enterprise Learning

FedGBF: An efficient vertical federated learning framework via gradient boosting and bagging

eFL-Boost: Efficient Federated Learning for Gradient Boosting Decision Trees

A Hybrid-Domain Framework for Secure Gradient Tree Boosting.

SGBoost: An Efficient and Privacy-Preserving Vertical Federated Tree Boosting Framework

FedDGP: Disentangling Global and Personal Models for Federated Learning

SecureBoost: A Lossless Federated Learning Framework

Gradient-less Federated Gradient Boosting Trees with Learnable Learning Rates

FDPBoost: Federated differential privacy gradient boosting decision trees

Federated Boosted Decision Trees with Differential Privacy

Large-scale Secure XGB for Vertical Federated Learning

Adaptive Histogram-Based Gradient Boosted Trees for Federated Learning

OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving Desensitization

OpBoost

Boosting Privately: Federated Extreme Gradient Boosting for Mobile Crowdsensing

Privet: A Privacy-Preserving Vertical Federated Learning Service for Gradient Boosted Decision Tables

A fault‐tolerant and scalable boosting method over vertically partitioned data

FedEmb: A Vertical and Hybrid Federated Learning Algorithm using Network And Feature Embedding Aggregation