A two-phase random forest with differential privacy

Jing Liu,Xianxian Li,Quanmin Wei,Songfeng Liu,Zhaohui Liu,Jinyan Wang
DOI: https://doi.org/10.1007/s10489-022-04119-6
IF: 5.3
2022-10-07
Applied Intelligence
Abstract:Random forest (RF) has become one of the state-of-the-art methods in machine learning owing to its low computational overhead and feasibility, while privacy leakage is a crucial issue of the random forest model. This study applies differential privacy into random forest algorithm to protect privacy. First, a novel differential privacy decision tree building algorithm is built. Moreover, a more reasonable privacy budget allocation strategy is designed, which does not need to further divide the privacy budget in each layer. This strategy is helpful to alleviate the problem of signal-to-noise ratio imbalance. Then we propose a two-phase differential privacy random forest method which takes into account the complementarity of decision trees. In particular, we first train a group of private decision trees, and then update weights for all samples. After that, in the second phase, we train another group of private decision trees which focus on the instances that the preceding trees are intractable. Finally, a range of experiments are conducted on 6 public data sets from the UCI machine learning repository. The results demonstrate that our method is superior to some existing differential privacy RF approaches in terms of the performance of classification.
computer science, artificial intelligence
What problem does this paper attempt to address?