Haplotype Inference Using A Bayesian Hidden Markov Model

Shuying Sun,Celia M. T. Greenwood,Radford M. Neal
DOI: https://doi.org/10.1002/gepi.20253
2007-01-01
Genetic Epidemiology
Abstract:Knowledge of haplotypes is useful for understanding block structure in the genome and disease risk associations. Direct measurement of haplotypes in the absence of family data is presently impractical, and hence, several methods have been developed for reconstructing haplotypes from population data. We have developed a new population-based method using a Bayesian Hidden Markov model for the source of the ancestral haplotype segments. In our Bayesian model, a higher order Markov model is used as the prior for ancestral haplotypes, to account for linkage disequilibrium. Our model includes parameters for the genotyping error rate, the mutation rate, and the recombination rate at each position. Computation is done by Markov Chain Monte Carlo using the forward-backward algorithm to efficiently sum over all possible state sequences of the Hidden Markov model. We have used the model to reconstruct the haplotypes of 129 children at a region on chromosome 5 in the data set of Daly et al. [2001] (for which true haplotypes are obtained based on parental genotypes) and of 30 children at selected regions in the CEU and YRI data of the HAPMAP project. The results are quite close to the family-based reconstructions and comparable with the state-of-the-art PHASE program. Our haplotype reconstruction method does not require division of the markers into small blocks of loci. The recombination rates inferred from our model can help to predict haplotype block boundaries, and estimate recombination hotspots.
What problem does this paper attempt to address?