Reconstructing parent genomes using siblings and other relatives

Ying Qiao,Ethan M. Jewett,Kimberly F. McManus,William A. Freyman,Joanne E. Curran,Sarah Williams-Blangero,John Blangero,The 23andMe Research Team,Amy L. Williams
DOI: https://doi.org/10.1101/2024.05.10.593578
2024-05-14
Abstract:Reconstructing the DNA of ancestors from their descendants has the potential to empower phenotypic analyses (including association and genetic nurture studies), improve pedigree reconstruction, and shed light on the ancestral population and phenotypes of ancestors. We developed HAPI-RECAP, a method that reconstructs the DNA of parents from full siblings and their relatives. This tool leverages HAPI2's output, a new phasing approach that applies to siblings (and optionally one or both parents) and reliably infers parent haplotypes but does not link the ungenotyped parents' DNA across chromosomes or between segments flanking ambiguities. By combining IBD between the reconstructed parents and the relatives, HAPI-RECAP resolves the source parent of these segments. Moreover, the method exploits crossovers the children inherited and sex-specific genetic maps to infer the reconstructed parents' sexes. We validated these methods on research participants from both 23andMe, Inc. and the San Antonio Mexican American Family Studies. Given data for one parent, HAPI2 reconstructs large fractions of the missing parent's DNA, between 77.6% and 99.97% among all families, and 90.3% on average in three- and four-child families. When reconstructing both parents, HAPI-RECAP inferred between 33.2% and 96.6% of the parents' genotypes, averaging 70.6% in four-child families. Reconstructed genotypes have average error rates <10 , or comparable to those from direct genotyping. HAPI-RECAP inferred the parent sexes 100% correctly given IBD-linked segments and can also reconstruct parents without any IBD. As datasets grow in size, more families will be implicitly collected; HAPI-RECAP holds promise to enable high quality parent genotype reconstruction.
Bioinformatics
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of reconstructing the genomes of ancestors (especially parents) from the DNA data of descendants (such as siblings and other relatives). Specifically, the authors developed a new method named HAPI - RECAP for reconstructing parents' DNA through the data of full - siblings and their relatives. This method can: 1. **Enhance the ability of phenotypic analysis**: including association studies and genetic nurturing studies, etc., which require knowledge of parents' genetic information. 2. **Improve family tree reconstruction**: by more accurately inferring kinship relationships, helping to construct more precise family trees. 3. **Reveal ancestral populations and phenotypic characteristics**: providing information about the origin of ancestral populations and certain traits. ### Main challenges In the absence of direct genotypic data, reconstructing parents' DNA faces several key challenges: - **Distinguishing gene fragments from different parents**: Due to recombination and independent segregation, multiple descendants inherit different gene fragments. How to correctly assign these fragments to their respective parents is a difficult problem. - **Determining the connection relationships between chromosomes**: Between different chromosomes or between different sections of the same chromosome, how to determine which fragments belong to the same parent. - **Gender inference**: How to use the crossover pattern and gender - specific genetic maps to infer the gender of parents. ### Solutions To solve these problems, the authors proposed the following methods: 1. **Use HAPI2 for preliminary reconstruction**: - HAPI2 is an extended family nucleotide typing method that can infer the minimum recombinant haplotype when only having siblings' data or plus the data of one or two parents. - HAPI2 can reconstruct most of the parents' DNA, but the fragments on different chromosomes cannot be clearly linked to the same parent, and there is ambiguity in some regions. 2. **Use HAPI - RECAP for further analysis**: - HAPI - RECAP uses IBD (Identity By Descent) shared fragments to solve the above - mentioned ambiguity. By comparing the IBD fragments with relatives, it determines which fragments belong to which parent. - Use the gender - specific genetic map and the crossover pattern in offspring to infer the gender of parents. ### Verification results The authors verified these methods on the participant data of 23andMe, Inc. and San Antonio Mexican American Family Studies. The results show: - Given the data of one parent, HAPI2 can reconstruct 77.6% to 99.97% of the missing parent's DNA, averaging 90.3% in families with three to four children. - When using HAPI - RECAP to reconstruct the genotypes of two parents, the success rate is between 33.2% and 96.6%, averaging 70.6% in families with four children. - The average error rate of the reconstructed genotype is less than \(10^{-3}\), which is comparable to the error rate of direct genotyping. - HAPI - RECAP can correctly infer the gender of parents 100% even without IBD fragments covering the X - chromosome. Overall, HAPI - RECAP shows great potential in high - quality reconstruction of parents' genotypes, especially as the data set continues to grow.