Amalgam: A Framework for Obfuscated Neural Network Training on the Cloud

Sifat Ut Taki,Spyridon Mastorakis
2024-10-28
Abstract:Training a proprietary Neural Network (NN) model with a proprietary dataset on the cloud comes at the risk of exposing the model architecture and the dataset to the cloud service provider. To tackle this problem, in this paper, we present an NN obfuscation framework, called Amalgam, to train NN models in a privacy-preserving manner in existing cloud-based environments. Amalgam achieves that by augmenting NN models and the datasets to be used for training with well-calibrated noise to "hide" both the original model architectures and training datasets from the cloud. After training, Amalgam extracts the original models from the augmented models and returns them to users. Our evaluation results with different computer vision and natural language processing models and datasets demonstrate that Amalgam: (i) introduces modest overheads into the training process without impacting its correctness, and (ii) does not affect the model's accuracy. The prototype implementation is available at: <a class="link-external link-https" href="https://github.com/SifatTaj/amalgam" rel="external noopener nofollow">this https URL</a>
Machine Learning,Cryptography and Security,Systems and Control
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to protect the privacy of model architectures and training datasets when training neural network (NN) models in the cloud. Specifically: 1. **Problem Background**: - When users train proprietary neural network models in the cloud, cloud service providers can access the model architectures and training datasets, which poses the risk of privacy leakage. - Currently, there is no simple method to conduct privacy - protected training in the cloud without exposing the models and data. 2. **Solutions Proposed in the Paper**: - The paper proposes a framework named Amalgam for training neural network models in a privacy - protected manner in existing cloud environments. - Amalgam "hides" the original information by adding carefully calibrated noise to the model architectures and training datasets, thus preventing cloud service providers from obtaining these sensitive information. - After the training is completed, Amalgam can extract the original model from the enhanced model and return it to the user. 3. **Main Contributions**: - **Design and Implementation**: It describes in detail the design, implementation, security and privacy analysis of Amalgam. - **Performance Evaluation**: It is evaluated using widely - used computer vision and natural language processing models and datasets, proving that Amalgam does not affect the correctness of training and the accuracy of the model with the introduction of moderate overhead. - **Adversarial Attack Testing**: Amalgam has been tested with various adversarial attacks, showing its robustness under such attacks. 4. **Technical Details**: - **Dataset Enhancement**: Confuse the data by inserting synthetic noise in the image or text datasets to ensure that the original information remains unchanged but the characteristics are hidden. - **Model Enhancement**: Enhance the neural network model by adding additional layers, parameters and connections, making it difficult to track the original model structure. - **Model Extraction**: Extract the original model parameters from the enhanced model and apply them to a new network structure, enabling it to use the original dataset for inference. 5. **Comparison with Other Methods**: - Compared with existing privacy - protection methods such as multi - party computation (MPC), homomorphic encryption (HE), federated learning (FL), differential privacy (DP), etc., Amalgam has lower overhead and higher compatibility and is suitable for any Python - based cloud service environment. In summary, this paper aims to solve the privacy - protection problem in cloud - based neural network training through the Amalgam framework while maintaining the correctness of the training process and the model performance.