BT24 Pseudonymization for artificial intelligence skin lesion datasets: a real-world feasibility study
Trisha J M Chin,Gillian X M Chin,James Sutherland,Andrew Coon,Colin Morton,Colin Fleming
DOI: https://doi.org/10.1093/bjd/ljae090.421
IF: 11.113
2024-06-28
British Journal of Dermatology
Abstract:Abstract The use of patient data for artificial intelligence (AI) research should be transparent, rigorous and accountable. In the UK, the General Data Protection Regulation, Data Protection Act 2018 and General Medical Council govern data handling and patients’ rights to privacy. We report on our multistep pseudonymization protocol for real-world skin lesion datasets, in preparation for research within a trusted research environment (TRE). Firstly, patients referred from primary care are triaged for community locality and imaging centre (CLIC) suitability. There, trained healthcare professionals capture lesion images (dermoscopic, macroscopic and regional) and patient information using a mobile application on trust-certified devices. Training is standardized across all CLIC sites, with specific anonymization training on removing in-frame clothing and jewellery, device positioning, and magnification to minimize identifiable features like eyes, nose and ears. Lesion datasets (paired images and clinical information) are subsequently transferred to an image management system (IMS) hosted on our trust-secured network. Within the IMS, images are manually inspected, and those with identifiable tattoos and piercings are excluded. All regional images are also excluded from transfer to the TRE. Before transfer to the TRE, images undergo a further round of review. Data fields are manually checked for identifiable patient information, patient names are removed, and dates of birth are rounded to 3-month granularity. The job ID, patient’s hospital number, date of clinical episode and responsible photographer are replaced with randomly generated project-specific identifiers. In an initial study period, 658 of 963 (68%) captured lesion datasets have undergone IMS manual inspection. Of these, 24 lesion datasets were excluded for identifiable features, 10 (41%) for more than one-third of the face being visible, 9 (38%) for full iris visibility, and 5 (21%) for tattoos. On breakdown by anatomical location these images were of the face (19, 80%), torso (2, 8%), limbs (2, 8%) and neck (1, 4%). The remaining 634 datasets (96%) were securely transferred to the TRE, where a further 5% were excluded due to potential identifiability. Although full anonymization is desirable, it is usually achieved by aggregating patient data. Pseudonymization, which allows for future reidentification in a secured fashion, strikes the balance between patient data privacy and clinical governance, while retaining a level of granularity sufficient for meaningful analysis. Currently, this protocol is manually intensive with room to partly automate. Use of common standardized protocols will strengthen the public trust in clinical AI.
dermatology