Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection

Lucas Lange,Nils Wenzlitschke,Erhard Rahm
DOI: https://doi.org/10.3390/s24103052
2024-05-14
Abstract:Smartwatch health sensor data are increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprise sensitive personal information and are resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress, employing Generative Adversarial Networks (GANs) and Differential Privacy (DP) safeguards. Our method not only protects patient information but also enhances data availability for research. To ensure its usefulness, we test synthetic data from multiple GANs and employ different data enhancement strategies on an actual stress detection task. Our GAN-based augmentation methods demonstrate significant improvements in model performance, with private DP training scenarios observing an 11.90-15.48% increase in F1-score, while non-private training scenarios still see a 0.45% boost. These results underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with the limited availability of real training samples. Through rigorous quality assessments, we confirm the integrity and plausibility of our synthetic data, which, however, are significantly impacted when increasing privacy requirements.
Machine Learning,Cryptography and Security
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to generate synthetic health sensor data for wearable device stress detection while protecting privacy. Specifically, the researchers face the following challenges: 1. **Sensitivity of medical data**: Health sensor data collected by wearable devices such as smart watches usually contains personal sensitive information. Directly using this data for research may lead to the risk of privacy leakage. 2. **Difficulty in data acquisition**: High - quality real - medical - data acquisition is costly and resource - intensive, which limits its wide application in research. 3. **Trade - off between privacy protection and data utility**: How to maintain or improve the quality of data used for training machine - learning models while ensuring user privacy. To solve these problems, the paper proposes a method based on Generative Adversarial Networks (GANs) and Differential Privacy (DP) to generate synthetic multi - modal time - series data. This method can not only protect patient information but also increase the amount of data available for research, thus optimizing the trade - off between privacy protection and data utility. ### Main contributions 1. **Generate synthetic multi - modal time - series data**: By training the GAN model, generate synthetic data similar to the real smart - watch health - sensor data. Each data point represents a stressed or non - stressed moment and has a corresponding label. 2. **Ensure the authenticity and privacy of data**: The generated synthetic data is close to the original distribution and can effectively expand or replace the existing limited data set while providing privacy guarantees. 3. **Improve the performance of privacy - protection models**: The stress - detection model trained with synthetic data significantly improves the model performance under privacy - protection conditions (such as DP - training scenarios). For example, in terms of F1 - score, the improvement in the private DP - training scenario is from 11.90% to 15.48%, and there is also a 0.45% improvement in the non - private training scenario. 4. **Promote practical applications**: This method makes stress detection through smart watches possible while protecting user privacy, enabling the generated health data to be freely used in a wider user group and enhancing research capabilities. ### Key technologies - **Generative Adversarial Networks (GANs)**: Used to capture the statistical distribution of a given data set and generate new synthetic data samples. - **Differential Privacy (DP)**: Protects the privacy of individual data points by introducing controllable noise, ensuring that even the addition or deletion of a single data point will not significantly affect the statistical results. - **Differential Privacy Stochastic Gradient Descent (DP - SGD)**: A modified stochastic - gradient - descent optimization method that achieves differential privacy by introducing noise in the gradient calculation. Through these technologies, the paper successfully solves the problem of generating high - quality synthetic health - sensor data under the premise of privacy protection and demonstrates its effectiveness in stress - detection tasks.