Abstract:Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.

A harmonized public resource of deeply sequenced diverse human genomes

A harmonized public resource of deeply sequenced diverse human genomes

High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation

The Simons Genome Diversity Project: 300 genomes from 142 diverse populations

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation

Large-scale Genotyping of Complex DNA

An integrated map of genetic variation from 1,092 human genomes

The Human Pangenome Project: a global resource to map genomic diversity

De novo assembly of 64 haplotype-resolved human genomes of diverse ancestry and integrated analysis of structural variation

Haplotype-resolved diverse human genomes and integrated analysis of structural variation

Assembly of a pan-genome from deep sequencing of 910 humans of African descent

Multi-platform Discovery of Haplotype-Resolved Structural Variation in Human Genomes

Integrating common and rare genetic variation in diverse human populations

Complex genetic variation in nearly complete human genomes

A haplotype map of the human genome The International HapMap Consortium

Semi-automated assembly of high-quality diploid human reference genomes

Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References

Comprehensive Characterization of Human Genome Variation by High Coverage Whole-Genome Sequencing of Forty Four Caucasians

Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects

Closing the gap: Solving complex medically relevant genes at scale

The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models