No major flaws in “Identification of individuals by trait prediction using whole-genome sequencing data”

Christoph Lippert,Riccardo Sabatini,M. Cyrus Maher,Eun Yong Kang,Seunghak Lee,Okan Arikan,Alena Harley,Axel Bernal,Peter Garst,Victor Lavrenko,Ken Yocum,Theodore M. Wong,Mingfu Zhu,Wen-Yun Yang,Chris Chang,Barry Hicks,Smriti Ramakrishnan,Haibao Tang,Chao Xie,Suzanne Brewerton,Yaron Turpaz,Amalio Telenti,Rhonda K. Roby,Franz Och,J. Craig Venter
DOI: https://doi.org/10.1101/187542
2017-09-11
Abstract:Abstract In a recently published PNAS article, we studied the identifiability of genomic samples using machine learning methods [Lippert et al., 2017]. In a response, Erlich [2017] argued that our work contained major flaws. The main technical critique of Erlich [2017] builds on a simulation experiment that shows that our proposed algorithm, which uses only a genomic sample for identification, performed no better than a strategy that uses demographic variables. Below, we show why this comparison is misleading and provide a detailed discussion of the key critical points in our analyses that have been brought up in Erlich [2017] and in the media. Further, not only faces may be derived from DNA, but a wide range of phenotypes and demographic variables. In this light, the main contribution of Lippert et al. [2017] is an algorithm that identifies genomes of individuals by combining multiple DNA-based predictive models for a myriad of traits.
What problem does this paper attempt to address?