Sequencing depth and coverage: key considerations in genomic analyses

David Sims,Ian Sudbery,Nicholas E. Ilott,Andreas Heger,Chris P. Ponting
DOI: https://doi.org/10.1038/nrg3642
IF: 59.581
2014-01-17
Nature Reviews Genetics
Abstract:Key PointsThe average depth of sequencing coverage can be defined theoretically as LN/G, where L is the read length, N is the number of reads and G is the haploid genome length.The breadth of coverage is the percentage of target bases that have been sequenced for a given number of times.Hybrid sequencing approaches are being introduced to overcome problems in genome assembly and in placing highly repetitive sequence in a genome.For DNA resequencing studies, the required sequencing capacity depends on the size of the regions of interest, the types of variant and the disease model being studied.The accuracy of variant calling is affected by sequence quality, uniformity of coverage and the threshold of false-discovery rate that is used.The power to identify and accurately quantify RNA molecules is dependent on their lengths and abundance, and on the number of sequenced reads.In human cells, 80% of transcripts that are expressed at >10 fragments per kilobase of exon per million reads mapped (FPKM) can be accurately quantified with ~36 million 100-bp paired-end sequenced reads.Depth of coverage is affected by the accuracy of genome alignment algorithms and by the uniqueness or the 'mappability' of sequencing reads within a target genome.Sequence depth influences the accuracy by which rare events can be quantified in RNA sequencing, chromatin immunoprecipitation followed by sequencing (ChIP–seq) and other quantification-based assays.Sequence depth must be traded off against the need for control samples and replicates.
genetics & heredity
What problem does this paper attempt to address?