Abstract:The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment.The open resource is unique in its size and scope.A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain.Follow-up information is provided by linking health and medical records.Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits.Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million.Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.Understanding the role that genetics has in phenotypic and disease variation, and its potential interactions with other factors, is crucial for a better understanding of human biology.It is hoped that this will lead to more successful drug development 1 , and potentially to more efficient and personalized treatments.As such, a key component of the UK Biobank resource has been the collection of genome-wide genetic data on every participant using a purpose-designed genotyping array 2 .An interim release of genotype data on approximately 150,000 UK Biobank participants in May 2015 3 has already facilitated numerous studies [4][5][6] .In this paper, we summarize the existing and planned content of the phenotype resource and describe the genetic dataset on the full 500,000 participants.To facilitate its wider use, we applied a range of quality control procedures and conducted a set of analyses that reveal properties of the genetic data-such as population structure and relatedness-that can be important for downstream analyses.In addition, we estimated haplotypes and imputed genotypes into the dataset that increases the number of testable variants by more than 100-fold to approximately 96 million variants.We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and replicated signals of known associations between HLA alleles and many common diseases.We describe tools that allow efficient genomewide association studies (GWAS) of multiple traits and fast phenome-wide association studies, which work together with a new compressed file format that has been used to distribute the dataset.As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.

Faculty Opinions Recommendation of the UK Biobank Resource with Deep Phenotyping and Genomic Data.

Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects

Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes

UK Biobank: a globally important resource for cancer research

Exome sequencing and analysis of 454,787 UK Biobank participants

Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease

Plasma proteomic associations with genetics and health in the UK Biobank

UK BioCoin: Swift Trait-Specific Summary Statistics Regression for UK Biobank

Multi-trait genome-wide analyses of the brain imaging phenotypes in UK Biobank

Genetics of 35 blood and urine biomarkers in the UK Biobank

Efficient Identification of Trait-Associated Loss-of-function Variants in the UK Biobank Cohort by Exome-Sequencing Based Genotype Imputation

Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank

Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank

A generalized linear mixed model association tool for biobank-scale data

Principled distillation of UK Biobank phenotype data reveals underlying structure in human variation

Rare variant contribution to human disease in 281,104 UK Biobank exomes

An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank

Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank

UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age

Prospective study design and data analysis in UK Biobank

Phenotype projections accelerate biobank-scale GWAS