A novel, privacy-preserving cryptographic approach for sharing sequencing data

Christopher A Cassa,Rachel A Miller,Kenneth D Mandl
DOI: https://doi.org/10.1136/amiajnl-2012-001366
2013-01-01
Abstract:Objective: DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords. Materials and methods: This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual's genetic sequence. This scheme requires access to a subset of an individual's genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch. Results: We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space. Discussion: Access to a set of an individual's genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture. Conclusions: It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual's genetic sequence.
What problem does this paper attempt to address?