Cryptotree: fast and accurate predictions on encrypted structured data

Daniel Huynh
DOI: https://doi.org/10.48550/arXiv.2006.08299
2020-06-15
Abstract:Applying machine learning algorithms to private data, such as financial or medical data, while preserving their confidentiality, is a difficult task. Homomorphic Encryption (HE) is acknowledged for its ability to allow computation on encrypted data, where both the input and output are encrypted, which therefore enables secure inference on private data. Nonetheless, because of the constraints of HE, such as its inability to evaluate non-polynomial functions or to perform arbitrary matrix multiplication efficiently, only inference of linear models seem usable in practice in the HE paradigm so far. In this paper, we propose Cryptotree, a framework that enables the use of Random Forests (RF), a very powerful learning procedure compared to linear regression, in the context of HE. To this aim, we first convert a regular RF to a Neural RF, then adapt this to fit the HE scheme CKKS, which allows HE operations on real values. Through SIMD operations, we are able to have quick inference and prediction results better than the original RF on encrypted data.
Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: How to apply machine - learning algorithms to private data (such as financial or medical data) for efficient and accurate prediction while protecting data privacy. Specifically, the author proposes a framework named Cryptotree, aiming to overcome the limitations of homomorphic encryption (HE) technology when dealing with non - linear models, especially to achieve fast and accurate prediction of the powerful but complex Random Forests (RF) model on encrypted data. ### Problem Background 1. **Requirement for Privacy Protection** - In fields such as finance and medicine, data is highly sensitive, and direct transmission to the server may lead to data leakage. - Homomorphic encryption (HE) allows calculations on encrypted data without decryption, thus protecting data privacy. 2. **Limitations of Existing Methods** - Although linear models can be implemented under HE, their expressive ability is limited. - Complex models such as deep neural networks (DNN) are difficult to be directly applied to encrypted data due to the limitations of HE (such as being unable to efficiently execute arbitrary matrix multiplications and non - polynomial functions). ### Main Contributions of the Paper 1. **Proposing the Cryptotree Framework** - Convert the random forest into a Neural Random Forest (NRF), and then further adjust it to adapt to the CKKS homomorphic encryption scheme. - Utilize the SIMD operation characteristics of CKKS to be able to process the predictions of multiple trees in parallel, significantly improving the inference speed. 2. **Technical Challenges Solved** - **Matrix Multiplication**: Implement efficient matrix multiplication through specific algorithms (such as Algorithm 1) to ensure that parallel calculations of multiple trees can be processed in the HE environment. - **Activation Function Approximation**: Use low - order polynomials to approximate non - linear activation functions (such as tanh) to ensure that the calculation results are within a reasonable range. 3. **Experimental Verification** - Experiments were carried out on the Adult Income dataset, and the results showed that the Homomorphic Random Forest (HRF) performance on encrypted data is very close to that of the original random forest, and the inference speed is fast. ### Summary This paper realizes fast and accurate prediction of encrypted structured data by transforming the random forest model into a form suitable for homomorphic encryption and utilizing the characteristics of the CKKS scheme. This method not only improves the level of privacy protection but also expands the feasibility of homomorphic encryption in practical applications, especially in dealing with complex models.