PASTA-4-PHT: A Pipeline for Automated Security and Technical Audits for the Personal Health Train

Sascha Welten,Karl Kindermann,Ahmet Polat,Martin Görz,Maximilian Jugl,Laurenz Neumann,Alexander Neumann,Johannes Lohmöller,Jan Pennekamp,Stefan Decker
2024-12-02
Abstract:With the introduction of data protection regulations, the need for innovative privacy-preserving approaches to process and analyse sensitive data has become apparent. One approach is the Personal Health Train (PHT) that brings analysis code to the data and conducts the data processing at the data premises. However, despite its demonstrated success in various studies, the execution of external code in sensitive environments, such as hospitals, introduces new research challenges because the interactions of the code with sensitive data are often incomprehensible and lack transparency. These interactions raise concerns about potential effects on the data and increases the risk of data breaches. To address this issue, this work discusses a PHT-aligned security and audit pipeline inspired by DevSecOps principles. The automated pipeline incorporates multiple phases that detect vulnerabilities. To thoroughly study its versatility, we evaluate this pipeline in two ways. First, we deliberately introduce vulnerabilities into a PHT. Second, we apply our pipeline to five real-world PHTs, which have been utilised in real-world studies, to audit them for potential vulnerabilities. Our evaluation demonstrates that our designed pipeline successfully identifies potential vulnerabilities and can be applied to real-world studies. In compliance with the requirements of the GDPR for data management, documentation, and protection, our automated approach supports researchers using in their data-intensive work and reduces manual overhead. It can be used as a decision-making tool to assess and document potential vulnerabilities in code for data processing. Ultimately, our work contributes to an increased security and overall transparency of data processing activities within the PHT framework.
Cryptography and Security,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: in the Personal Health Train (PHT) framework, the potential security and transparency issues introduced when external code is executed in sensitive environments (such as hospitals). Specifically: 1. **Opacity of code - sensitive data interaction**: - When external code interacts with sensitive data under the PHT framework, its behavior is often unpredictable and opaque. This leads to a lack of trust in the data - processing process. - The code may contain vulnerabilities introduced unintentionally or intentionally, and these vulnerabilities may be maliciously exploited, resulting in data leakage or other security issues. 2. **Lack of automated auditing and detection mechanisms**: - Although the PHT has demonstrated successful applications in multiple studies, currently there is a lack of a systematic, automated method to detect and audit potential vulnerabilities in the analysis code. - This lack makes it difficult for researchers and data holders to ensure the security and compliance of the code, especially in the case of complying with regulations such as the General Data Protection Regulation (GDPR). To solve these problems, the paper proposes an automated security and technical audit pipeline based on DevSecOps principles - **Pipeline for Automated Security and Technical Audits for the Personal Health Train (PASTA - 4 - PHT)**. This pipeline aims to identify and record potential vulnerabilities in PHT code through multiple - stage detection and auditing, thereby improving the security and transparency of data - processing activities. ### Main objectives: - **Improve code transparency**: Through detailed documentation and version control systems, make the structure and change history of the code more transparent and easier to review. - **Automatically detect vulnerabilities**: Through multiple technical means such as Static Application Security Testing (SAST), dependency scanning, secret detection, and Dynamic Application Security Testing (DAST), automatically detect potential vulnerabilities in the code. - **Ensure compliance**: Ensure that the code complies with the requirements of regulations such as GDPR, provide detailed audit reports, and support decision - makers in evaluating the security of the code. Through these measures, PASTA - 4 - PHT not only improves the code security under the PHT framework but also reduces the workload of manual review and enhances the overall transparency and credibility of data - processing activities.