Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications

Chao Deng,Timothy Daley,Guilherme De Sena Brandine,Andrew D. Smith
DOI: https://doi.org/10.1146/annurev-biodatasci-072018-021339
2019-01-01
Annual Review of Biomedical Data Science
Abstract:High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.
What problem does this paper attempt to address?