A new preprocessing procedure for the haplotype inference problem

A haplotype is a DNA sequence that is inherited from one parent. They are especially important in the study of complex diseases since they contain more information than genotype data, so the next high priority phase in human genomics involves the development of a full Haplotype Map of human genome [1]. However, obtaining haplotype data is technically difficult and expensive. One of the computational methods for obtaining haplotype data from genotype data is the pure parsimony criterion, an approach known as Haplotype Inference by Pure Parsimony (HIPP). It has been proved to be an NP-hard problem. We present a new preprocessing method which drastically decreases the number of relevant haplotypes. Several algorithms need to preprocess data; for big problem instances this key procedure is even more important than the process. This preprocessing was eventually tested on real and simulated data applying a tabu search, and the performance of the resulting algorithm showed it to be competitive with the best actual solvers.

[1]  Inês Lynce,et al.  Efficient Haplotype Inference with Boolean Satisfiability , 2006, AAAI.

[2]  Inês Lynce,et al.  Efficient Haplotype Inference with Pseudo-boolean Optimization , 2007, AB.

[3]  V. Rich Personal communication , 1989, Nature.

[4]  Xiang-Sun Zhang,et al.  Haplotype Inference by Pure Parsimony via Genetic Algorithm , 1997 .

[5]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[6]  Daniel G. Brown,et al.  A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis , 2004, WABI.

[7]  Inês Lynce,et al.  Efficient Haplotype Inference with Combined CP and OR Techniques , 2008, CPAIOR.

[8]  Inês Lynce,et al.  Haplotype Inference by Pure Parsimony: A Survey , 2010, J. Comput. Biol..

[9]  Lusheng Wang,et al.  Haplotype inference by maximum parsimony , 2003, Bioinform..

[10]  Paola Bonizzoni,et al.  The Haplotyping problem: An overview of computational models and solutions , 2003, Journal of Computer Science and Technology.

[11]  Daniel G. Brown,et al.  Integer programming approaches to haplotype inference by pure parsimony , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Alex Zelikovsky,et al.  Linear Reduction for Haplotype Inference , 2004, WABI.

[13]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[14]  Dan Gusfield,et al.  An Overview of Combinatorial Methods for Haplotype Inference , 2002, Computational Methods for SNPs and Haplotype Inference.