SilentWood: Private Inference Over Gradient-Boosting Decision Forests

Ronny Ko,Rasoul Akhavan Mahdavi,Byoungwoo Yoon,Makoto Onizuka,Florian Kerschbaum
2024-11-23
Abstract:Gradient-boosting decision forests, as used by algorithms such as XGBoost or AdaBoost, offer higher accuracy and lower training times for large datasets than decision trees. Protocols for private inference over decision trees can be used to preserve the privacy of the input data as well as the privacy of the trees. However, naively extending private inference over decision trees to private inference over decision forests by replicating the protocols leads to impractical running times. In this paper, we explore extending the private decision inference protocol using homomorphic encryption by Mahdavi et al. (CCS 2023) to decision forests. We present several optimizations that identify and then remove (approximate) duplication between the trees in a forest and hence achieve significant improvements in communication and computation cost over the naive approach. To the best of our knowledge, we present the first private inference protocol for highly scalable gradient-boosting decision forests. Our optimizations extend beyond Mahdavi et al.'s protocol to various private inference protocols for gradient-boosting decision trees. Our protocol's inference time is faster than the baseline of parallel running the protocol by Mahdavi et al.~by up to 28.1x, and faster than Zama's Concrete ML XGBoost by up to 122.25x.
Cryptography and Security,Databases
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the efficiency and performance problems encountered during privacy inference on Gradient - Boosting Decision Forests (GBDTs). Specifically, it attempts to solve the following two main problems: 1. **Inefficiency when existing protocols are extended to GBDTs**: - A gradient - boosting decision forest consists of multiple decision trees, and the results of these trees need to be combined into a final prediction result. Existing privacy inference protocols are mainly designed for a single decision tree. Directly extending these protocols to GBDTs will lead to impractical running times because the computational complexity and communication overhead increase significantly. - When using Homomorphic Encryption (HE) for privacy inference, since it is necessary to handle the combined results of multiple decision trees, the multiplicative depth of the circuit is increased, thus requiring larger parameters and leading to a substantial increase in resource requirements. 2. **Optimizing the inference process of multiple trees**: - Even without combining the results of multiple trees, simply running the privacy inference of multiple decision trees independently will exhaust the current computing resources. Therefore, optimization measures need to be introduced to reduce the cost of performing multiple decision - tree evaluations on the same input. To solve these problems, the paper proposes a new protocol - SilentWood. SilentWood achieves significant performance improvements through the following methods: - **Blind Code Conversion Protocol**: It is used to convert indicator bits with different encodings during the inference process, ensuring that homomorphically encrypted data can be correctly aggregated. - **Identifying and removing duplicates between trees in the forest**: By discovering and removing (approximate) duplicate parts between trees in the forest, the overall computation and communication costs are reduced. - **Optimizing the reuse of input data**: By identifying and removing duplicates in the input to the decision forest, the communication volume is significantly reduced, and the computation cost is also reduced. Through these optimizations, the inference time of SilentWood is up to 28.1 times faster than the baseline method (i.e., simply repeating the protocol proposed by Mahdavi et al.), and up to 122.25 times faster than Zama's Concrete ML XGBoost. ### Summary The main contribution of this paper is to provide an efficient and scalable method for performing privacy inference on GBDTs, solving the shortcomings of existing methods in scalability and efficiency, thereby making machine - learning inference under privacy protection more practical.