Haplotype inference using a Bayesian Hidden Markov model

Knowledge of haplotypes is useful for understanding block structure in the genome and disease risk associations. Direct measurement of haplotypes in the absence of family data is presently impractical, and hence, several methods have been developed for reconstructing haplotypes from population data. We have developed a new population‐based method using a Bayesian Hidden Markov model for the source of the ancestral haplotype segments. In our Bayesian model, a higher order Markov model is used as the prior for ancestral haplotypes, to account for linkage disequilibrium. Our model includes parameters for the genotyping error rate, the mutation rate, and the recombination rate at each position. Computation is done by Markov Chain Monte Carlo using the forward‐backward algorithm to efficiently sum over all possible state sequences of the Hidden Markov model. We have used the model to reconstruct the haplotypes of 129 children at a region on chromosome 5 in the data set of Daly et al. [ 2001 ] (for which true haplotypes are obtained based on parental genotypes) and of 30 children at selected regions in the CEU and YRI data of the HAPMAP project. The results are quite close to the family‐based reconstructions and comparable with the state‐of‐the‐art PHASE program. Our haplotype reconstruction method does not require division of the markers into small blocks of loci. The recombination rates inferred from our model can help to predict haplotype block boundaries, and estimate recombination hotspots. Genet. Epidemiol. 2007. © 2007 Wiley‐Liss, Inc.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[4]  K H Buetow,et al.  Influence of aberrant observations on high-resolution linkage analysis outcomes. , 1991, American journal of human genetics.

[5]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[6]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[7]  B. Rannala,et al.  High-resolution multipoint linkage-disequilibrium mapping in the context of a human genome sequence. , 2001, American journal of human genetics.

[8]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[9]  E A Thompson,et al.  Linkage disequilibrium mapping: the role of population history, size, and structure. , 2001, Advances in genetics.

[10]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[11]  Stacey S Cherny,et al.  The impact of genotyping error on family-based analysis of quantitative traits , 2001, European Journal of Human Genetics.

[12]  A. Mander,et al.  Haplotype Analysis in Population-based Association Studies , 2001 .

[13]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[14]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[15]  Peter H. Westfall,et al.  Testing Association of Statistically Inferred Haplotypes with Discrete and Continuous Traits in Samples of Unrelated Individuals , 2002, Human Heredity.

[16]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[17]  Katherine M Kirk,et al.  The impact of genotyping error on haplotype reconstruction and frequency estimation , 2002, European Journal of Human Genetics.

[18]  Jeanette C Papp,et al.  Detection and integration of genotyping errors in statistical genetics. , 2002, American journal of human genetics.

[19]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[20]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[21]  Roded Sharan,et al.  Bayesian haplo-type inference via the dirichlet process , 2004, ICML.

[22]  Hannu Toivonen,et al.  A Markov Chain Approach to Reconstruction of Long Haplotypes , 2003, Pacific Symposium on Biocomputing.

[23]  Robert C Elston,et al.  Estimating haplotype frequencies in pooled DNA samples when there is genotyping error , 2004, BMC Genetics.

[24]  M. W. Foster,et al.  Integrating ethics and science in the International HapMap Project , 2004, Nature Reviews Genetics.

[25]  Dan Geiger,et al.  High density linkage disequilibrium mapping using models of haplotype block variation , 2004, ISMB/ECCB.

[26]  K. Goddard,et al.  Linkage analysis of alcoholism-related electrophysiological phenotypes: genome scans with microsatellites compared to single-nucleotide polymorphisms , 2005, BMC Genetics.

[27]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[28]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[29]  Heikki Mannila,et al.  A Hidden Markov Technique for Haplotype Reconstruction , 2005, WABI.

[30]  Shuying Sun,et al.  Association of nuclear factor-kappaB in psoriatic arthritis. , 2005, The Journal of rheumatology.

[31]  Ron Shamir,et al.  A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association , 2005, J. Comput. Biol..

[32]  Zhaohui S. Qin,et al.  A comparison of phasing algorithms for trios and unrelated individuals. , 2006, American journal of human genetics.

[33]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[34]  Roded Sharan,et al.  Bayesian Haplotype Inference via the Dirichlet Process , 2007, J. Comput. Biol..