Private information leakage from single-cell count matrices

Conor R Walker,Xiaoting Li,Manav Chakravarthy,William Lounsbery-Scaife,Yoolim A Choi,Ritambhara Singh,Gamze Gürsoy
DOI: https://doi.org/10.1016/j.cell.2024.09.012
IF: 64.5
2024-09-27
Cell
Abstract:The increase in publicly available human single-cell datasets, encompassing millions of cells from many donors, has significantly enhanced our understanding of complex biological processes. However, the accessibility of these datasets raises significant privacy concerns. Due to the inherent noise in single-cell measurements and the scarcity of population-scale single-cell datasets, recent private information quantification studies have focused on bulk gene expression data sharing. To address this gap, we demonstrate that individuals in single-cell gene expression datasets are vulnerable to linking attacks, where attackers can infer their sensitive phenotypic information using publicly available tissue or cell-type-specific expression quantitative trait loci (eQTLs) information. We further develop a method for genotype prediction and genotype-phenotype linking that remains effective without relying on eQTL information. We show that variants from one study can be exploited to uncover private information about individuals in another study.
What problem does this paper attempt to address?