Accurate detection of mosaic variants in sequencing data without matched controls

Yanmei Dou,Minseok Kwon,Rachel E Rodin,Isidro Cortés-Ciriano,Ryan Doan,Lovelace J Luquette,Alon Galor,Craig Bohrson,Christopher A Walsh,Peter J Park
DOI: https://doi.org/10.1038/s41587-019-0368-8
Abstract:Detection of mosaic mutations that arise in normal development is challenging, as such mutations are typically present in only a minute fraction of cells and there is no clear matched control for removing germline variants and systematic artifacts. We present MosaicForecast, a machine-learning method that leverages read-based phasing and read-level features to accurately detect mosaic single-nucleotide variants and indels, achieving a multifold increase in specificity compared with existing algorithms. Using single-cell sequencing and targeted sequencing, we validated 80-90% of the mosaic single-nucleotide variants and 60-80% of indels detected in human brain whole-genome sequencing data. Our method should help elucidate the contribution of mosaic somatic mutations to the origin and development of disease.
What problem does this paper attempt to address?